Tracker Version v4: Pending Fetch API Requests Accumulating in OutQueue on Chrome #1355

j7i · 2024-10-16T18:45:58Z

Description

We are currently conducting tests on the new tracker version v4 and have encountered an issue where requests become stuck in a pending state.

Initially, the OutQueue functions correctly, effectively sending out events. However, as more events are triggered, they begin to accumulate in the OutQueue (local storage) and fail to be sent.

These accumulated events are only dispatched upon refreshing the page, given that they are recorded in the local storage via the OutQueue.

This problem appears with the latest version of the Chrome browser. We have not observed the same issue in either Firefox or Safari, where the system operates as expected.

--

We're addressing this issue here to keep everyone who might be experiencing the same problem updated on potential solutions and causes. Moreover, we warmly invite anyone facing similar issues to share their insights and findings. If there's a more effective platform to report our progress, please don't hesitate to let us know.

Steps to Reproduce

At this time, we do not have a public source available for reproducing this issue. However, we are actively investigating the problem on our end to ensure that there are no factors within our setup that may be causing interference.

We are open to sharing our findings and insights along with details of our setup, potentially even in a call, and we are exploring ways to provide something that could help reproduce the issue.

Expected behavior

The expected behavior involves the successful dispatch of all event requests. When an event is triggered, it should be sent out efficiently without being trapped in the OutQueue (local storage). No event should remain in a pending state for an extended period of time.

Screenshots

Environment

macOS 14.6.1 (23G93)
Chrome
Version 130.0.6723.59 (Official Build) (arm64)
@snowplow/browser-tracker: 4.0.0-beta.2

Additional context

While we're looking into the problem, we've found that some browsers might limit "keepalive" requests. These limits could be on how many requests can be sent at the same time, or on the total size of data that can be sent in these requests. This could be part of the issue we're dealing with.
And this might not be the problem in our case as we are not seeing any failed events (pending only).

Interesting information/reads about keepalive:

https://fetch.spec.whatwg.org/#request-keepalive-flag
Specify restriction for requests with keepalive set whatwg/fetch#679
keepalive: Do we need to restrict the number of requests at a time? whatwg/fetch#662
https://docs.google.com/document/d/1OBubpZkvxB9WioVAnfK_Fn5up_8E9o6hqX7NN0LJENY/edit?tab=t.0
https://groups.google.com/a/chromium.org/g/blink-dev/c/8vqcABTnDF4
https://chromium-review.googlesource.com/c/chromium/src/+/895422
[wild guess + is not the case within our setup] There might be a potential issues with multiple trackers configured with different endpoints targeting to different hosts: see line 495
https://chromium.googlesource.com/chromium/src/+/refs/tags/132.0.6780.1/content/browser/loader/keep_alive_url_browsertest.cc

The text was updated successfully, but these errors were encountered:

davidher-mann · 2024-10-16T22:27:11Z

Thanks Joel for creating the issue. @miike @matus-tomlein as promised, this is the first feedback we can provide for the JS tracker version v4.

matus-tomlein · 2024-10-17T07:10:12Z

Thank for testing the beta and reporting this @j7i and @davidher-mann! We are looking into the issue.

matus-tomlein · 2024-10-17T08:05:27Z

Could you please help us with a few questions to debug this deeper:

Could you try to configure keepalive: false in the newTracker call to try to isolate if this is also happening when fetch keepalive is disabled or not?
Are you tracking events from a service worker or intercepting fetch requests from a service worker?
Do you see this happening also in incognito mode or in other browsers (trying to rule out some extensions)?

j7i · 2024-10-17T08:49:30Z

Sure thing, good questions.

We don't see this bahaviour with the keepalive setting turned off with false (same Chrome incognito)
We are not sending snowplow tracking events through a service worker and neither are intercepting any requests from a service worker
We also see this behaviour in Chrome incognito mode and Edge incognito mode, but not in Safari or Firefox.
Edge: Version 129.0.2792.89 (Official build) (arm64)
Chrome: Version 130.0.6723.59 (Official Build) (arm64)

Maybe also worth mentioning is that when we have e.g. ~30+ event entries stuck in the OutQueue from a previous session, only ~10+ of them manage to get sent/flushed out of the queue after a reload.
So the queue seems not to be flushed completely on a reload, and we don't see any pending requests after the reload; this may be an expected behaviour (didn't checked the OutQueue code yet).

// Additional config insights
const config: TrackerConfiguration = {
    platform: "web",
    discoverRootDomain: true,
    cookieSameSite: "Lax",
    cookieSecure: true,
    encodeBase64: true,
    eventMethod: "post",
    keepalive: true, // issue does not arise when set to `false`
    bufferSize: 1,
    maxPostBytes: 40000,
    cookieLifetime: 63072000,
    stateStorageStrategy: "cookieAndLocalStorage",
    maxLocalStorageQueueSize: 1000,
    connectionTimeout: 5000,
    anonymousTracking: false,
    customHeaders: {},
    ...
}

Beside: We are currently sending our tracking request in our preview environment to a different host; and are now trying to send the tracking events to the same host and we will check if this will make any difference.

j7i · 2024-10-17T08:54:30Z

Beside changing our host setting I will also play around with the bufferSize setting to check if this will have any impact by reducing the amount of concurrent keepalive requests. Will post insights from that later on.

matus-tomlein · 2024-10-17T10:38:32Z

Thanks for the extra context @j7i!

I think we might be hitting the browser limits that you mentioned with the number of keepalive requests. The bufferSize configuration option is not great because that also delays sending new events until the buffer size is reached. But we are investigating if we can improve the batching in case a number of events piles up in the local storage queue – currently they will be sent individually (if buffer size is 1).

But I wouldn't expect the tracker to make parallel requests to the collector – can I just check with you if this is happening? Can you see multiple requests being sent in parallel or sequentially after each other?

j7i · 2024-10-17T12:34:21Z

Ah, then I misunderstood the bufferSize setting. I thought that requests would hold the amount of events represented by this number; e.g., a bufferSize: 16 would collect 16 events in the OutQueue and then make one request with all of them (if they do not exceed the maxPostBytes). So it's a middle ground, where the queue holds them back but then sends them one by one - thanks for clarifying.

When batching the requests we may also need to take the payload size limit of keepalive requests into account; I couldn't find any official statement about this yet; could be 64kb.

Regarding the tracker making parallel requests:
I guess this is happening - I currently don't have a clear view on the waterfall - does this recording provide the insights you're looking for?

matus-tomlein · 2024-10-17T12:41:59Z

So it's a middle ground, where the queue holds them back but then sends them one by one - thanks for clarifying.

Sorry I explained that in a confusing way – the events are not sent one by one, but in one batch request. So the tracker will wait for 16 events to be tracked and then make one request. The problem I was referring to is that in case one page visit will only track 15 events, these will forever stay in the local storage queue unless one other is tracked.

Thanks for the recording – it does seem that the requests are overlapping, so likely from the point of view of the tracker, the requests have failed (otherwise it wouldn't start follow up requests if the previous didn't finish).

Do you see any console errors? Or perhaps can you try to log the calls to onRequestFailure callback to see what error we get there (see v3 docs here)?

j7i · 2024-10-17T13:39:41Z

Alright - I'll come back to this in a later point of time, as we would like to experiment with increasing the bufferSize and e.g. flush on demand on tab visibility changes, so that these 15 events mentioned in the example would get flushed before one leaves the page/tab.

Indeed I can see some errors by logging on request failure:

By continuing browsing and generating tracking events these error logs pop up one by one according to the network tab pending requests / beside that there are no other error logs captured:

It seems that we get notified about the failure but may don't cancel/abort the pending keepalive request?

matus-tomlein · 2024-10-18T09:51:38Z

Hi @j7i, we have just published 4.0.0-beta.3 version that improves batching and closes the request on timeout/failure. Could you please try it out to see if there is some improvement/change?

j7i · 2024-10-21T08:34:06Z

Thanks for the update @matus-tomlein, the issue is still occurring with the new changes in place.
I see a higher chance of success with a buffer size of 1 compared to 16, but still - events get stuck very quickly and result in the same behaviour as described previously.

matus-tomlein · 2024-10-22T09:11:38Z

Hi @j7i, we have just published 4.0.0-beta.4 that contains a change from #1358. The idea there is that we read the full response from the collector (even though it's not used by the tracker) before continuing with sending follow up requests. This aims to avoid overlapping requests to the collector.

We have deployed this in our internal apps and while it seems to help with the problem, it doesn't completely solve it – I can still get the failed requests if there are a lot of requests being made at the same time (we have a number of trackers instrumented that track events in parallel).

If you have a chance, can you please try the new beta to see if it changes anything on your end?

I think we are running against the 64kb limitation that browsers put on the keepalive requests – it seems that it is applied also across requests, so not just that a single request can't be over 64kb (which we could control in the tracker using maxPostBytes), but all requests (not really sure over what period of time/window) can't be over that limit.

We could add a delay in between collector requests to limit the size of the total requests over a window, but that seems like a hacky solution...

matus-tomlein · 2024-10-28T11:54:55Z

Closing for now as I think we have addressed this for the most part. Based on our testing, the problem may still occur in case of parallel requests made by multiple tracker instances using keepalive, but we can't really avoid that for now. Please reopen if you think this needs more attention!

j7i added the type:defect Bugs or weaknesses. The issue has to contain steps to reproduce. label Oct 16, 2024

matus-tomlein mentioned this issue Oct 17, 2024

Enable batching multiple events into a single POST request over buffer size if maxPostBytes setting is followed #1356

Merged

matus-tomlein mentioned this issue Oct 21, 2024

Read the request response to avoid overlapping requests to collector #1358

Merged

matus-tomlein mentioned this issue Oct 23, 2024

Change the default value for keepalive to false #1359

Merged

matus-tomlein closed this as completed Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracker Version v4: Pending Fetch API Requests Accumulating in OutQueue on Chrome #1355

Tracker Version v4: Pending Fetch API Requests Accumulating in OutQueue on Chrome #1355

j7i commented Oct 16, 2024

davidher-mann commented Oct 16, 2024

matus-tomlein commented Oct 17, 2024

matus-tomlein commented Oct 17, 2024

j7i commented Oct 17, 2024

j7i commented Oct 17, 2024

matus-tomlein commented Oct 17, 2024

j7i commented Oct 17, 2024 •

edited

Loading

matus-tomlein commented Oct 17, 2024

j7i commented Oct 17, 2024

matus-tomlein commented Oct 18, 2024

j7i commented Oct 21, 2024

matus-tomlein commented Oct 22, 2024

matus-tomlein commented Oct 28, 2024

Tracker Version v4: Pending Fetch API Requests Accumulating in OutQueue on Chrome #1355

Tracker Version v4: Pending Fetch API Requests Accumulating in OutQueue on Chrome #1355

Comments

j7i commented Oct 16, 2024

Description

Steps to Reproduce

Expected behavior

Screenshots

Environment

Additional context

davidher-mann commented Oct 16, 2024

matus-tomlein commented Oct 17, 2024

matus-tomlein commented Oct 17, 2024

j7i commented Oct 17, 2024

j7i commented Oct 17, 2024

matus-tomlein commented Oct 17, 2024

j7i commented Oct 17, 2024 • edited Loading

matus-tomlein commented Oct 17, 2024

j7i commented Oct 17, 2024

matus-tomlein commented Oct 18, 2024

j7i commented Oct 21, 2024

matus-tomlein commented Oct 22, 2024

matus-tomlein commented Oct 28, 2024

j7i commented Oct 17, 2024 •

edited

Loading