Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward User-Agent from AWS CloudFront to catalog app #4059

Closed
2 tasks
FuhuXia opened this issue Nov 11, 2022 · 8 comments
Closed
2 tasks

Forward User-Agent from AWS CloudFront to catalog app #4059

FuhuXia opened this issue Nov 11, 2022 · 8 comments
Assignees
Labels
component/catalog Related to catalog component playbooks/roles Feature logging Notifications O&M Operations and maintenance tasks for the Data.gov platform

Comments

@FuhuXia
Copy link
Member

FuhuXia commented Nov 11, 2022

User Story

In order to better understand the origin of requests coming to catalog app, data.gov team wants see User-Agent header info in catalog app logs.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN all catalog.data.gov requests go thru CloudFront -> catalog-proxy -> catalog-web
    AND request to /api/action/status_show is not cached
    WHEN I run command curl -A "my-test-user-agent" https://catalog.data.gov/api/action/status_show
    THEN the access logs capture this request
    AND my-test-user-agent is found when searching the log
  • GIVEN command curl -I https://cata.data.gov/dataset is run
    WHEN the command is run for the 2nd time with random string in user-agent
    curl -s -I -A "!@#$%^" https://catalog.data.gov/dataset | grep x-cache
    THEN x-cache: Hit from cloudfront as output to show cached result is hit.

Background

We want the original user-agent header info in the request instead of all requests are marked from "Amazon CloudFront", but we don't want user-agent as a cache-key to be cached against.

Security Considerations (required)

No

Sketch

Follow this AWS instruction to set up an Origin Request policy.

@hkdctol hkdctol added the O&M Operations and maintenance tasks for the Data.gov platform label Nov 17, 2022
@hkdctol hkdctol moved this to 📔 Product Backlog in data.gov team board Nov 17, 2022
@nickumia-reisys
Copy link
Contributor

Conflicting documentation 😠 😢 💢 💢
https://aws.amazon.com/blogs/networking-and-content-delivery/amazon-cloudfront-announces-cache-and-origin-request-policies/

Forwarding information such as the User-Agent to the origin for analytics/logging but without serving different content variants based on device type (now you can forward the user-agent header and exclude it from the cache-key)

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html#request-custom-headers-behavior

Whether you can configure CloudFront to cache objects based on header values for that header.
You can configure CloudFront to cache objects based on values in the Date and User-Agent headers, but we don't recommend it. These headers have many possible values, and caching based on their values would cause CloudFront to forward significantly more requests to your origin.
image

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 2, 2023

I think this is the Origin request policy we need:
Managed-UserAgentRefererHeaders

@nickumia-reisys
Copy link
Contributor

We can't use that because it drops all Query Strings:
image

But we made a custom one that was similar to it (if it's too much we can just do the user-agent and referer:
image

@nickumia-reisys
Copy link
Contributor

I think our custom solution was working, it was just the config on catalog-proxy that was causing it to break

@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 2, 2023

We can modify the nginx config. Detecting "Amazon CloudFront" was a convenient but temporary solution.
https://github.com/GSA/catalog.data.gov/blob/fedf828f33985c691007065ca0de646abc01e1a0/proxy/nginx-cloudfront.conf#L16-L23

@nickumia-reisys nickumia-reisys self-assigned this Aug 2, 2023
@nickumia-reisys nickumia-reisys moved this from 📔 Product Backlog to 🏗 In Progress [8] in data.gov team board Aug 2, 2023
nickumia-reisys added a commit to GSA/catalog.data.gov that referenced this issue Aug 2, 2023
This is because of an external change in the CloufFront configuration, see details GSA/data.gov#4059 (comment)
@nickumia-reisys
Copy link
Contributor

@Jin-Sun-tts was able to create our domains on cloud.gov with the revived command:

cf create-private-domain gsa-datagov catalog-dev.data.gov
cf create-private-domain gsa-datagov catalog-stage.data.gov

Interim report:

  • The solution to this issue is creating a custom Cache Policy and a custom Origin Request Policy (with the specifications above).
    • General Steps in Cloudfront on AWS Console:
      • Create Cache Policy
        • Go to Policies in the menu on the left.
        • Click on Create Cache Policy in the Custom Policies Section under the Cache Tab.
          • NOTE: Policy only needs to be created once per account.
        • Fill out the details.
          • Note: TTL specs vary based on catalog route (api vs. reports vs. sitemap vs default)
            • We should document this somewhere...
        • For Cache Key Settings, choose either Host or no keys at all.
        • For Compression Support, leave default?
        • Save.
      • Create Origin Request Policy
        • Go to Policies in the menu on the left.
        • Click on Create Origin Request Policy in the Custom Policies Section under the Origin Request Tab.
          • NOTE: Policy only needs to be created once per account.
        • Fill out the details.
        • For Headers, include at least Referer, User-Agent, Host and CloudFront-Forwarded-Proto (but we should be able to select All Viewer Headers
        • For Query Strings, select All.
        • For Cookies, select None.
        • Save.
      • Use the Policies
        • Go to a Cloudfront Distribution
        • Go the to Behaviors Tab.
        • Edit a specific behavior.
        • Under the Cache key and origin requests section, select Cache policy and origin request policy (recommended)
        • Select the custom cache and origin request policies you just created.
        • Save changes.
  • Forwarding the user-agent is solved. However, the nginx config on catalog-proxy needs to be updated. So this will be blocked until that issue is solved, Limit catalog-proxy traffic to only from Cloudfront #4413

@nickumia-reisys nickumia-reisys moved this from 🏗 In Progress [8] to 📡 Blocked in data.gov team board Aug 3, 2023
@nickumia-reisys nickumia-reisys mentioned this issue Aug 31, 2023
10 tasks
@FuhuXia
Copy link
Member Author

FuhuXia commented Aug 31, 2023

One Origin Request Policy CatalogProdOriginPolicy was created with the following settings.

Headers - Include the following headers - Referer; User-Agent; Host
Cookies - None
Query strings - All

Three Cache Policies were created

  • catalogProdCachePolicy as default
Minimum TTL (seconds) 300
Maximum TTL (seconds) 31536000
Default TTL (seconds) 86400

Headers - None
Cookies - None
Query strings - All

Gzip Enabled
Brotli Enabled

  • catalogProdCachePolicy-REPORT for /report/*
Minimum TTL (seconds) 86400
Maximum TTL (seconds) 86400
Default TTL (seconds) 86400

Headers - None
Cookies - None
Query strings - All

Gzip Enabled
Brotli Enabled
  • catalogProdCachePolicy-NOCACHE for /api/* and /sitemap/*
Minimum TTL (seconds) 0
Maximum TTL (seconds) 0
Default TTL (seconds) 0

Headers - None
Cookies - None
Query strings - None

Gzip Disabled
Brotli Disabled

Staging and Development have follow prod and have same settings.

@nickumia-reisys nickumia-reisys moved this from 📡 Blocked to ✔ Done in data.gov team board Aug 31, 2023
@FuhuXia FuhuXia self-assigned this Aug 31, 2023
@nickumia-reisys
Copy link
Contributor

Unblocked by completion of

User-agents are in the logs of catalog-proxy app.

@nickumia-reisys nickumia-reisys added component/catalog Related to catalog component playbooks/roles Notifications labels Oct 9, 2023
@nickumia-reisys nickumia-reisys moved this from ✔ Done to 🗄 Closed in data.gov team board Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/catalog Related to catalog component playbooks/roles Feature logging Notifications O&M Operations and maintenance tasks for the Data.gov platform
Projects
Archived in project
Development

No branches or pull requests

3 participants