Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker Version v4: Pending Fetch API Requests Accumulating in OutQueue on Chrome #1355

Closed
j7i opened this issue Oct 16, 2024 · 13 comments
Closed
Labels
type:defect Bugs or weaknesses. The issue has to contain steps to reproduce.

Comments

@j7i
Copy link

j7i commented Oct 16, 2024

Description

We are currently conducting tests on the new tracker version v4 and have encountered an issue where requests become stuck in a pending state.

Initially, the OutQueue functions correctly, effectively sending out events. However, as more events are triggered, they begin to accumulate in the OutQueue (local storage) and fail to be sent.

These accumulated events are only dispatched upon refreshing the page, given that they are recorded in the local storage via the OutQueue.

This problem appears with the latest version of the Chrome browser. We have not observed the same issue in either Firefox or Safari, where the system operates as expected.

--

We're addressing this issue here to keep everyone who might be experiencing the same problem updated on potential solutions and causes. Moreover, we warmly invite anyone facing similar issues to share their insights and findings. If there's a more effective platform to report our progress, please don't hesitate to let us know.

Steps to Reproduce

At this time, we do not have a public source available for reproducing this issue. However, we are actively investigating the problem on our end to ensure that there are no factors within our setup that may be causing interference.

We are open to sharing our findings and insights along with details of our setup, potentially even in a call, and we are exploring ways to provide something that could help reproduce the issue.

Expected behavior

The expected behavior involves the successful dispatch of all event requests. When an event is triggered, it should be sent out efficiently without being trapped in the OutQueue (local storage). No event should remain in a pending state for an extended period of time.

Screenshots

image
screen-recording-example

Environment

  • macOS 14.6.1 (23G93)
  • Chrome
  • Version 130.0.6723.59 (Official Build) (arm64)
  • @snowplow/browser-tracker: 4.0.0-beta.2

Additional context

While we're looking into the problem, we've found that some browsers might limit "keepalive" requests. These limits could be on how many requests can be sent at the same time, or on the total size of data that can be sent in these requests. This could be part of the issue we're dealing with.
And this might not be the problem in our case as we are not seeing any failed events (pending only).

Interesting information/reads about keepalive:

@j7i j7i added the type:defect Bugs or weaknesses. The issue has to contain steps to reproduce. label Oct 16, 2024
@davidher-mann
Copy link

Thanks Joel for creating the issue. @miike @matus-tomlein as promised, this is the first feedback we can provide for the JS tracker version v4.

@matus-tomlein
Copy link
Contributor

Thank for testing the beta and reporting this @j7i and @davidher-mann! We are looking into the issue.

@matus-tomlein
Copy link
Contributor

Could you please help us with a few questions to debug this deeper:

  1. Could you try to configure keepalive: false in the newTracker call to try to isolate if this is also happening when fetch keepalive is disabled or not?
  2. Are you tracking events from a service worker or intercepting fetch requests from a service worker?
  3. Do you see this happening also in incognito mode or in other browsers (trying to rule out some extensions)?

@j7i
Copy link
Author

j7i commented Oct 17, 2024

Sure thing, good questions.

  1. We don't see this bahaviour with the keepalive setting turned off with false (same Chrome incognito)
  2. We are not sending snowplow tracking events through a service worker and neither are intercepting any requests from a service worker
  3. We also see this behaviour in Chrome incognito mode and Edge incognito mode, but not in Safari or Firefox.
    Edge: Version 129.0.2792.89 (Official build) (arm64)
    Chrome: Version 130.0.6723.59 (Official Build) (arm64)

Maybe also worth mentioning is that when we have e.g. ~30+ event entries stuck in the OutQueue from a previous session, only ~10+ of them manage to get sent/flushed out of the queue after a reload.
So the queue seems not to be flushed completely on a reload, and we don't see any pending requests after the reload; this may be an expected behaviour (didn't checked the OutQueue code yet).

// Additional config insights
const config: TrackerConfiguration = {
    platform: "web",
    discoverRootDomain: true,
    cookieSameSite: "Lax",
    cookieSecure: true,
    encodeBase64: true,
    eventMethod: "post",
    keepalive: true, // issue does not arise when set to `false`
    bufferSize: 1,
    maxPostBytes: 40000,
    cookieLifetime: 63072000,
    stateStorageStrategy: "cookieAndLocalStorage",
    maxLocalStorageQueueSize: 1000,
    connectionTimeout: 5000,
    anonymousTracking: false,
    customHeaders: {},
    ...
}

Beside: We are currently sending our tracking request in our preview environment to a different host; and are now trying to send the tracking events to the same host and we will check if this will make any difference.

@j7i
Copy link
Author

j7i commented Oct 17, 2024

Beside changing our host setting I will also play around with the bufferSize setting to check if this will have any impact by reducing the amount of concurrent keepalive requests. Will post insights from that later on.

@matus-tomlein
Copy link
Contributor

Thanks for the extra context @j7i!

I think we might be hitting the browser limits that you mentioned with the number of keepalive requests. The bufferSize configuration option is not great because that also delays sending new events until the buffer size is reached. But we are investigating if we can improve the batching in case a number of events piles up in the local storage queue – currently they will be sent individually (if buffer size is 1).

But I wouldn't expect the tracker to make parallel requests to the collector – can I just check with you if this is happening? Can you see multiple requests being sent in parallel or sequentially after each other?

@j7i
Copy link
Author

j7i commented Oct 17, 2024

Ah, then I misunderstood the bufferSize setting. I thought that requests would hold the amount of events represented by this number; e.g., a bufferSize: 16 would collect 16 events in the OutQueue and then make one request with all of them (if they do not exceed the maxPostBytes). So it's a middle ground, where the queue holds them back but then sends them one by one - thanks for clarifying.

When batching the requests we may also need to take the payload size limit of keepalive requests into account; I couldn't find any official statement about this yet; could be 64kb.


Regarding the tracker making parallel requests:
I guess this is happening - I currently don't have a clear view on the waterfall - does this recording provide the insights you're looking for?

timing-pending-requests

@matus-tomlein
Copy link
Contributor

So it's a middle ground, where the queue holds them back but then sends them one by one - thanks for clarifying.

Sorry I explained that in a confusing way – the events are not sent one by one, but in one batch request. So the tracker will wait for 16 events to be tracked and then make one request. The problem I was referring to is that in case one page visit will only track 15 events, these will forever stay in the local storage queue unless one other is tracked.

Thanks for the recording – it does seem that the requests are overlapping, so likely from the point of view of the tracker, the requests have failed (otherwise it wouldn't start follow up requests if the previous didn't finish).

Do you see any console errors? Or perhaps can you try to log the calls to onRequestFailure callback to see what error we get there (see v3 docs here)?

@j7i
Copy link
Author

j7i commented Oct 17, 2024

Alright - I'll come back to this in a later point of time, as we would like to experiment with increasing the bufferSize and e.g. flush on demand on tab visibility changes, so that these 15 events mentioned in the example would get flushed before one leaves the page/tab.

Indeed I can see some errors by logging on request failure:

image

By continuing browsing and generating tracking events these error logs pop up one by one according to the network tab pending requests / beside that there are no other error logs captured:

image

It seems that we get notified about the failure but may don't cancel/abort the pending keepalive request?

@matus-tomlein
Copy link
Contributor

Hi @j7i, we have just published 4.0.0-beta.3 version that improves batching and closes the request on timeout/failure. Could you please try it out to see if there is some improvement/change?

@j7i
Copy link
Author

j7i commented Oct 21, 2024

Thanks for the update @matus-tomlein, the issue is still occurring with the new changes in place.
I see a higher chance of success with a buffer size of 1 compared to 16, but still - events get stuck very quickly and result in the same behaviour as described previously.

@matus-tomlein
Copy link
Contributor

Hi @j7i, we have just published 4.0.0-beta.4 that contains a change from #1358. The idea there is that we read the full response from the collector (even though it's not used by the tracker) before continuing with sending follow up requests. This aims to avoid overlapping requests to the collector.

We have deployed this in our internal apps and while it seems to help with the problem, it doesn't completely solve it – I can still get the failed requests if there are a lot of requests being made at the same time (we have a number of trackers instrumented that track events in parallel).

If you have a chance, can you please try the new beta to see if it changes anything on your end?

I think we are running against the 64kb limitation that browsers put on the keepalive requests – it seems that it is applied also across requests, so not just that a single request can't be over 64kb (which we could control in the tracker using maxPostBytes), but all requests (not really sure over what period of time/window) can't be over that limit.

We could add a delay in between collector requests to limit the size of the total requests over a window, but that seems like a hacky solution...

@matus-tomlein
Copy link
Contributor

Closing for now as I think we have addressed this for the most part. Based on our testing, the problem may still occur in case of parallel requests made by multiple tracker instances using keepalive, but we can't really avoid that for now. Please reopen if you think this needs more attention!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:defect Bugs or weaknesses. The issue has to contain steps to reproduce.
Projects
None yet
Development

No branches or pull requests

3 participants