Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking change] createWithCache v2 #2546

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

[Breaking change] createWithCache v2 #2546

wants to merge 10 commits into from

Conversation

frandiox
Copy link
Contributor

Proposal for a new version of createWithCache with intentional breaking changes.

The current version returns a function that runs any asynchronous operation and you are on charge of throwing to prevent caching errors. This seems to be easy to forget and devs are often caching errors by mistake.
Since most of the times this function seem to be used to call a single fetch operation, we can add a new utility to do this but where we do the status check on their behalf:

// Same params as before
const withCache = createWithCache({request, cache, waitUntil}) 

// 1. New `fetch` method:
// This method caches the responses if they are OK responses, and you can pass `hydrogenOptions.shouldCacheResponse`, `hydrogenOptions.cacheKey`, etc. to modify behavior.
// `data` is the consumed body of the response (we need to consume to cache it).

const {data, response} = await withCache.fetch('https://sanity.com/...', {headers, body}, {cache: CacheLong()})


// 2. Old behavior in `run` method:
// This is useful to run *multiple* fetch calls and merge their responses, or run any arbitrary code. It caches anything you return, but you can throw if you don't want to cache anything.

const {stuff} = await withCache.run(['key'], CacheLong(), () => { return {stuff} })

Questions:

  • Do we prefer a third parameter in fetch for the cache options, or a custom property in the second param? E.g. withCache.fetch('...', {headers, body, hydrogen: { cache: CacheLong() })
  • fetch already checks for status errors (only caches 2xx responses) but there could still be situations where devs don't want to cache the result based on body (e.g. GraphQL APIs tend to return errors in body with a 200 status). For this, you can pass shouldCacheResponse parameter optionally. Do we want to make this a required param instead to force devs check the integrity of the response body?
  • Do we want to return {data, response} with named properties, or prefer array like [data, response] for easier renaming? I kind of prefer the object but it's a bit verbose when the response body contains another data: const { data: { data, errors }, response } = withCache.fetch(...).
  • When the response is unsuccessful, this is returning the intact response without consuming its body, and data will be null. This way devs can response.text() if their API returns extra info if they want.

While we could avoid breaking changes by nesting fetch under the run function, or creating a complete new utility, I think it's useful to add a breaking change in this case to force devs misusing run to switch to fetch. Also, the migration for those who want to stick to run would just be replacing const withCache = createWithCache(...) with const {run: withCache} = createWithCache(...) in server.ts, so it's not too problematic.

@frandiox frandiox requested a review from a team September 18, 2024 09:59
Copy link
Contributor

We detected some changes in packages/*/package.json or packages/*/src, and there are no updates in the .changeset.
If the changes are user-facing and should cause a version bump, run npm run changeset add to track your changes and include them in the next release CHANGELOG.
If you are making simple updates to examples or documentation, you do not need to add a changeset.

Copy link
Contributor

shopify bot commented Sep 18, 2024

Oxygen deployed a preview of your fd-with-cache-v2 branch. Details:

Storefront Status Preview link Deployment details Last update (UTC)
classic-remix ✅ Successful (Logs) Preview deployment Inspect deployment October 9, 2024 8:56 PM
sitemap ✅ Successful (Logs) Preview deployment Inspect deployment October 9, 2024 8:55 PM
custom-cart-method ✅ Successful (Logs) Preview deployment Inspect deployment October 9, 2024 8:55 PM
metaobjects ✅ Successful (Logs) Preview deployment Inspect deployment October 9, 2024 8:55 PM
third-party-queries-caching ✅ Successful (Logs) Preview deployment Inspect deployment October 9, 2024 8:56 PM
Skeleton (skeleton.hydrogen.shop) ✅ Successful (Logs) Preview deployment Inspect deployment October 9, 2024 8:55 PM

Learn more about Hydrogen's GitHub integration.

/**
* Fetches data from a URL and caches the result according to the strategy provided.
* When the response is not successful (e.g. status code >= 400), the caching is
* skipped automatically and the returned `data` is `null`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return data even when the status isn't ok?

Copy link
Contributor Author

@frandiox frandiox Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was doing this at first but I'm not sure it's ideal. The thing is, if the status is not OK, the returned body might be an error message, a whole HTML page with instructions, or something completely random (depends on the API).
The dev might not even know the shape.

If we read the body in this situation, we don't know about the type returned in data. That's why I thought returning null and letting them choose what to do with the response.body might be better.

Alternatively, perhaps we could return string type in this scenario? So if you do fetch<MyData>(...) you get back data: MyData | string, depending on the type. We could allow passing as second generic for this type as well 🤔

Or we could add a different returned parameter: const {data, response, errorBody} = fetch(), where errorBody is only returned on non-200 responses? 🫤

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed is that it's not possible to narrow the returned type based on nested properties. For example response.ok === false should mean that data: null but the narrowing doesn't work because ok is nested in response.

Considering this, perhaps we should return something that mimics a response instead:

const response = await fetch<MyData, MyError>(...);

// This supports type narrowing:
if (!response.ok) {
  // response.body is type MyError (defaults to 'string')
  console.log(response.body)
  return;
}

// response.body is type MyData
return response.body;

This response is not a real Fetch Response type, just something that contains the useful data from it: consumed body, ok: boolean, status: number, statusText: string, headers: Headers, etc.

Would that be better, or more confusing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think that there sometimes might be logic even when something fails. And it is common for there to be a failure code in the response. But like you said, we have no idea what it could be. Sometimes it won't even be a string if the 500 has no body. Perhaps we just mark it as unknown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just think that there sometimes might be logic even when something fails

Yes yes, I agree. I guess we are just discussing how we get that payload after errors.

Sometimes it won't even be a string if the 500 has no body

In this case we can return empty string, which is not incorrect I guess (if we go with data: string | MyData)

Perhaps we just mark it as unknown?

In this case, data: unknown | MyData turns always into unknown.

The only way to make it work would be using a different property to return the error body (as mentioned here) or by returning a top level Response-like payload (mentioned here).

Which one would you like more?

cc @wizardlyhel thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be simplest? We could just serialize it to string. And it's up to them to parse it? So it's always present, a string of the body response, and an empty string if there is no body.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are allowing devs to pass in their own type, wouldn't they know what to expect the shape of the data even if the response is not ok.

So technically, in our code, we just need data: T and devs should supply cache.fetch<MyData | MyErrorData | null>(...) to anticipate the shape of the returned data, even when it errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty common for the error type to be different than the 200 type.

Copy link
Contributor

@wizardlyhel wizardlyhel Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to return response.clone().body for data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't they know what to expect the shape of the data even if the response is not ok.

I wouldn't have this expectation. Too many API have undocumented responses / behavior specially when approaching rate limits / high-traffic etc

cache?: CachingStrategy;
cacheInstance?: Cache;
cacheKey?: CacheKey;
shouldCacheResponse?: (body: any, response: Response) => boolean;
shouldCacheResponse?: (body: T, response: Response) => boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we want to make this function as a required field

Copy link
Contributor Author

@frandiox frandiox Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an open question I wrote in the OP:

fetch already checks for status errors (only caches 2xx responses) but there could still be situations where devs don't want to cache the result based on body (e.g. GraphQL APIs tend to return errors in body with a 200 status). For this, you can pass shouldCacheResponse parameter optionally. Do we want to make this a required param instead to force devs check the integrity of the response body?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for required

): Promise<{data: Body | null; response: Response}> {
return fetchWithServerCache<Body | null>(url, requestInit ?? {}, {
waitUntil,
cacheKey: [url, requestInit],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requestInit might end up as [Object object]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets stringified later in hashKey so I think we're good. That said, maybe we should flatten it here for a small perf.

Also, perhaps we should ensure headers are turned into entries here. Should we consider filtering headers automatically? Or leave that to the developer? They can always provide a manual cache key.

Copy link
Contributor

@wizardlyhel wizardlyhel Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should ensure headers are turned into entries here

I think that having requestInit is something that shouldn't be use as a default cache key. A requestInit contains the following:

  • body - this could be anything but since it's the body of a request, it should be minimal
  • headers - this is the worrying piece. What if this contains some header that shouldn't be used for as cache key? For example, in our own storefront handler, we manually took out some headers, that contains cookie information that we had to pass along as headers in the request, for creating a cache key.

I would prefer the cacheKey be defaulted to just [url, requestInit.body]

Copy link
Contributor

@blittle blittle Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good callout Helen. I think the main issue would be if the user just proxied the primary request headers to the sub-request headers, then they'd definitely have few cache hits. But I don't think that would be common. If I'm using the API, and I write this code, I don't think I'd expect them to share the same cache:

cache.fetch('some-api.json', {
  method: 'GET',
  headers: { token: env.someApiToken }
});

// vs

cache.fetch('some-api.json', {
  method: 'GET',
  headers: { token: env.someOtherApiToken }
});

A common use case here would be calling a 3p API for personalized results. If a 3p API uses headers to define the persona of the request, then the cache would bleed between users.

Copy link
Contributor

@wizardlyhel wizardlyhel Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm ... if we are debating over what is safer between never hit a cache read or cache bleed, I would say never hit a cache read is safer.

Then I'm fine with spreading the requestInit and the header as part of the cache key

let data;
if (!response.ok) {
// Skip caching and consuming the response body
return response;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this follow the return type of [T, Response]?

@juanpprieto
Copy link
Contributor

Sorry finally coming around to reviewing this

Do we prefer a third parameter in fetch for the cache options, or a custom property in the second param? E.g. withCache.fetch('...', {headers, body, hydrogen: { cache: CacheLong() })

I would prefer a third parameter to avoid "overwriting" the std fetch signature params

fetch already checks for status errors (only caches 2xx responses) but there could still be situations where devs don't want to cache the result based on body (e.g. GraphQL APIs tend to return errors in body with a 200 status). For this, you can pass shouldCacheResponse parameter optionally. Do we want to make this a required param instead to force devs check the integrity of the response body?

Although more verbose I would prefer to force devs in providing shouldCacheResponse. The main goal of this change is to prevent caching bad responses, so we should go all the way in preventing them.

Do we want to return {data, response} with named properties, or prefer array like [data, response] for easier renaming? I kind of prefer the object but it's a bit verbose when the response body contains another data: const { data: { data, errors }, response } = withCache.fetch(...).

I also prefer object. Maybe {result, response} is a better e.g { result: { data, errors }, response } = withCache.fetch(...).

When the response is unsuccessful, this is returning the intact response without consuming its body, and data will be null. This way devs can response.text() if their API returns extra info if they want.

Yeah this is great.

@wizardlyhel wizardlyhel self-assigned this Oct 3, 2024
@wizardlyhel
Copy link
Contributor

wizardlyhel commented Oct 9, 2024

@rbshop and I just went thru this PR and addressed some feedbacks. Usage update:

Converted from inline parameters to a single object - making shouldCacheResult mandatory parameter

-   withCache(cacheKey, strategy, actionFn)
+  withCache.run({
+    cacheKey,
+    strategy,
+    shouldCacheResult,
+    actionFn,
+  })

The naming of each parameter are still pending.

New addition of withCache.fetch with mandatory shouldCacheResult

withCache.fetch(
    'https://www.google.com',  // url
    {},  // requestInit
    {    // options
      cacheKey: ['google']
      cache: CacheLong(),
      shouldCacheResult: () => true,
    }
  ).then(({data, response}) => {
  console.log(data);
})

However these parameter will be named, they should be the same names that's used in withCache.run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants