Player stucks with buffer empty #1000

pmendozav · 2020-11-17T14:21:05Z

Description

I have a live streaming with several renditions in 60fps. This is a simple description of these (id - resolution - segment size aprox)

A - 256x144 - 200kb
B - 416x234 - 320kb
C - 640x360 - 500kb
D - 960x540 - 1mb
E - 1280x720 - 2mb
F - 1280x720 - 3mb

When I tried to use the manifest I observed that player stucks observing the next situations:

Keeps downloading manifests: from a single variant that sometimes is not the lower
Keeps downloading manifests + segments: but the video does not play
(buffer always is empty)

Some observations:

I found that sometimes when the video is in buffering, player tries to change the rendition but keeps selecting the same and during switching abort() from segmentLoader is called and the segments are not loaded in buffer (keeping empty).
Sometimes using the tool https://videojs-http-streaming.netlify.app/utils/stats/ I observed that if I force to use A/B/C the CPU in between 10-30% and heap uses 30mb aprox but if I force to use F and wait some minutes the memory demand is between 40-80% and heap grows also (without playback)
In the case of F (major bandwidth), Safari requires only a couple of seconds to start and never stucks

Sources

This is a sample of the live streaming I'm using:
https://hs-video-testing.s3.amazonaws.com/master_abc_60.m3u8

Steps to reproduce

Go to https://videojs-http-streaming.netlify.app/utils/stats/
Use https://hs-video-testing.s3.amazonaws.com/master_abc_60.m3u8
Force to use F_60.m3u8
Wait

Results

Expected

Player could to stuck a couple of seconds for buffering but should to recover even if the major rendition is forced to use.

Error output

No errors/warnings reported in the console

Additional Information

videojs-http-streaming version

what version of videojs-http-streaming does this occur with?

videojs version

what version of videojs does this occur with?
videojs 7.8.2, 7.8.4, 7.10.2

Browsers

what browsers are affected? please include browser and version for each

Chrome

Platforms

what platforms are affected? please include operating system and version or device and version for each

macOS Catalina

Other Plugins

are any other videojs plugins being used on the page? If so, please list them with version below.
*

Other JavaScript

are you using any other javascript libraries or frameworks on the page? if so please list them below.
*

Gregory-Gerard · 2020-11-25T22:43:47Z

Maybe related to my own issue? #999

Using an old version of video.js (7.2.3) "resolved" this issue in my case.

gkatsev · 2020-11-30T22:35:09Z

This issue (and @Gregory-Gerard's) potentially goes away with our new experimental ABR. You can try setting the experimentalBufferBasedABR option (or hit the checkbox for it on https://videojs-http-streaming.netlify.app/).
The flag enabled a timer that handles checking whether we need to switch, so, if downloads stall or something we don't just stop playback. It also does a better job and switch or not switching.

Gregory-Gerard · 2020-11-30T22:59:20Z

@gkatsev Just tried and player still enter in a "infinite buffering" mode, i need to click on live ui button to start again my live, after waiting 30 sec - 1 min. I provide some additional information with a screencast in #999 if you'd like.

jengel3 · 2020-12-01T22:07:09Z

FWIW, this was occurring for me on 7.10.2, but downgrading to 7.8.4 did resolve the issue.

EdNutting · 2021-08-17T01:58:39Z

@gkatsev I believe I partially understand this issue -- and afaict the ABR option won't actually fix the cause though it may change the behaviour / reproduction.

Background

Users of @clowdr_app have been plagued by HLS streaming problems for a while now, complaining about the player hitting a loading spinner, heating up their laptops and never recovering. Only a page refresh could save them (and sometimes their entire browser crashed - seemed like network overload or OoM). I finally managed to trigger the bug for myself yesterday.

Reproduction

(It's a flaky issue - the reproduction just as much so!)

Using either the latest VideoJS package version or the sample app version (linked by the OP), without the experimental option you mentioned, I can generally trigger the infinite loading spinner by following these steps:

The Apple LL HLS stream: https://ll-hls-test.apple.com/master.m3u8
Start out with dev tools open, Network tab open, no throttling.
Load the stream, start playing at live, and it ideally will pick the 4Mbps quality.
In the dev tools, after a few seconds or any longer time, set the throttling down to somewhere between or at "Good 3G" and "Good 2G".
What we've replicated is a sudden and marked reduction in network quality. This should trigger some buffering and a drop to one of the lower quality feeds in the stream.
Wait. Just keep waiting. Eventually, you're going to hit an infinite loading spinner (or at least, I do, my colleagues do and so do most of our users!) You'll know you've hit it when you see the same segment being endlessly requested in the dev tools Network tab.
If you now go back to No Throttling, you'll see the rate of segment loads shoots up to "too many per second". The underlying bug is a timing bug, so sometimes the stream recovers but often it doesn't.

(Inept) Diagnosis

...after poking around in a debugger for 5 hours.

As far as I can make out, here's what's happening:

Initially the stream plays fine. The PlaybackWatcher checkCurrentTime_ function calls fixBadSeeks_ which finds no issues because the stream isn't buffering (the network is strong) and everything is playing out just fine.
The network quality drops, buffering occurs.
Now, sometimes (not always, not even often, just sometimes!) a race condition rears its ugly head.
1. The segment finishes loading just before checkCurrentTime_ gets called.
2. checkCurrentTime_'s timeout expires thus meaning it gets called
3. Here's where things get funky. The segment hasn't yet been processed by the HTML video player, so the video player is still attempting to seek to the start of the stream - it reports seeking as true. During checkCurrentTime_, this results in a call to fixBadSeeks_
4. fixBadSeeks_ sees the valid segment, but the segment took a while to load. As a result, the currentTime is now out of range of the segment. The code triggers a seek to the start or end of the new segment (and from what I've seen, it also has to be the only segment).
5. This seek causes a seeking event to be fired by the HTML video element
6. VideoJS picks up the seeking event but the HTML video player still hasn't had a chance to process the segment yet. As a result, the seeking event's call to setCurrentTime unwittingly blows up the loaded segment by calling abort and resetEverything because the HTML video player reports an empty buffer (and setCurrentTime assumes setting a time with an empty buffer must be invalid).
Ok, so checkCurrentTime_ triggered a seek to the start or end of the segment, which in turn destroyed the segment that was just loaded that it was trying to seek to.
The re-loading of the same segment begins (in response to the reset).
Moments later, checkCurrentTime_ comes back around. Maybe the reloaded segment isn't there yet - nothing happens. Eventually, the segment info will be available.
- As before, the player is seeking and fixBadSeeks_ is going to try to fix the out of sync "issue".
- However, the currentTime has now been set by the code - we know it's going to be either the beginning or end of the segment -- except that neither is going to be a valid target!
- Unfortunately, because of the way the conditions are structured, it's going to pick the opposite point to the previous choice of seek. I.e. if it picked "seek to start" the first time, it'll pick "seek to end" the second, and vice versa (remember, we have only a single segment to worry about in this situation so the seekable range.length is 1).
Because the target time has changed, another seeking event is fired and the cycle repeats.
Sometimes, if the browser slows down enough, you can actually see this happening in the UI - the time marker flip flops back and forth from one end of the time bar to the other (and the "live" indicator toggles on and off accordingly).
This situation is irrecoverable for the duration of the buffer depth because the player "gets stuck" trying to reload and seek the same segment again and again. After the buffer depth, typically 60s or so, the player says "ah but that segment is now out of range entirely, move on to the next segment" and so, assuming the race condition doesn't trigger a second time, the stream may recover (if the browser or stream server haven't aborted due to the sheer volume of network requests in the meantime).

A fix?

...probably not but it's worth a shot. Maybe it'll lend some insight or give you a starting point for debugging further :)

Ooof, that was long! Properly confusing and I'm still not sure I really understand what starts this madness in the first place - it's definitely a timing/race condition (and possibly introduced by some subtle change of browser behaviour in 2020? Who knows!)

Anyway, a fix (though I doubt it's a correct/good fix) is to modify PlaybackWatcher:fixesBadSeeks_ to the following:

Original:

if (this.beforeSeekableWindow_(seekable, currentTime)) {

Modified:

if (seekable.length && seekable.length > 1 && this.beforeSeekableWindow_(seekable, currentTime)) {

This effectively says "if we're trying to buffer a single segment, never try to re-seek to its start. Only seek to its end (i.e. the live point) if needed" (i.e. only the isAfterSeekableRange condition applies).

I guess maybe there is an alternative fix for setCurrentTime that doesn't treat "set time with no buffer must be invalid" but I couldn't figure out how to get it to work nor what the side effects might be. At least the change above appears to be pretty self contained and low impact.

Testing "a fix"

I have tested this change and found that the infinite buffering does not recur and the player functions as expected. My diagnosis/analysis above is probably deeply flawed in some way (after all, it's just based on poking around with a debugger in FireFox for 5 hours - I don't pretend to understand the Video JS code!) but the fix proposed at least seems to do the trick.

Closing remarks

Anyway, I hope this helps gain some insight into (some variant of) this problem and a possible solution!

We would really appreciate a fix soon - @clowdr_app's users are not entirely happy about having to refresh the page every 25s just to get the stream to recover (and the 20s of lost viewing every time). It hits some users consistently and others not at all; and it's nothing to do with bandwidth (even streaming at 360p on a 100Mbps connection with a powerful computer some of our users and I have run into the problem!)

video-archivist-bot · 2021-08-17T01:58:40Z

Hey! We've detected some video files in a comment on this issue. If you'd like to permanently archive these videos and tie them to this project, a maintainer of the project can reply to this issue with the following commands:

for https://ll-hls-test.apple.com/master.m3u8: say @video-archivist-bot save mj7YrA

gkatsev · 2021-08-17T17:53:15Z

@EdNutting Thanks for the super detailed comment. We'll definitely take a look!

heennkkee · 2021-10-04T15:02:44Z

I have also run into this problem, verified by using the method @EdNutting described and reproduced issue both with and without the experimentalBufferBasedABR-flag.

stale · 2022-01-08T23:56:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

EdNutting · 2022-01-09T00:39:33Z

Anti-bot post to stop this being prematurely closed. Just because it hasn't been fixed after so long and users are still suffering doesn't mean it's not an issue. We resorted to significantly increasing the segment duration - sacrificing latency - in order to work around this issue (as per the referenced PR from Aug 2021 above).

stale · 2022-04-17T06:15:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

EdNutting · 2022-04-18T15:14:31Z

Anti-bot post to stop this being prematurely closed.

gkatsev · 2022-04-19T19:05:23Z

When this is fixed, we should revisit #999 and #1049 to confirm they were fixed.

adrums86 · 2023-10-30T19:30:28Z

Can anyone confirm if this was fixed with #1433?

EdNutting mentioned this issue Aug 17, 2021

Increase AWS MediaPackage HLS segment duration clowdr-app/clowdr#338

Merged

9 tasks

brandonocasey mentioned this issue Aug 17, 2021

fix: fix seekable by using timestampOffset to convert expired/start to relative time #1187

Closed

stale bot added the outdated label Jan 8, 2022

stale bot removed the outdated label Jan 9, 2022

stale bot added the outdated label Apr 17, 2022

stale bot removed the outdated label Apr 18, 2022

gkatsev added the pinned label Apr 18, 2022

gkatsev mentioned this issue Apr 19, 2022

Player infinite buffering with HLS stream #999

Closed

gkatsev mentioned this issue Apr 19, 2022

Issue when seeking across PTS rollovers. #1049

Closed

302Gerben mentioned this issue Aug 31, 2023

Fix ddos created by requesting infinitely for the same chunks #1423

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Player stucks with buffer empty #1000

Player stucks with buffer empty #1000

pmendozav commented Nov 17, 2020

Gregory-Gerard commented Nov 25, 2020

gkatsev commented Nov 30, 2020

Gregory-Gerard commented Nov 30, 2020

jengel3 commented Dec 1, 2020

EdNutting commented Aug 17, 2021 •

edited

Loading

video-archivist-bot commented Aug 17, 2021

gkatsev commented Aug 17, 2021

heennkkee commented Oct 4, 2021

stale bot commented Jan 8, 2022

EdNutting commented Jan 9, 2022

stale bot commented Apr 17, 2022

EdNutting commented Apr 18, 2022

gkatsev commented Apr 19, 2022

adrums86 commented Oct 30, 2023

Player stucks with buffer empty #1000

Player stucks with buffer empty #1000

Comments

pmendozav commented Nov 17, 2020

Description

Sources

Steps to reproduce

Results

Expected

Error output

Additional Information

videojs-http-streaming version

videojs version

Browsers

Platforms

Other Plugins

Other JavaScript

Gregory-Gerard commented Nov 25, 2020

gkatsev commented Nov 30, 2020

Gregory-Gerard commented Nov 30, 2020

jengel3 commented Dec 1, 2020

EdNutting commented Aug 17, 2021 • edited Loading

Background

Reproduction

(Inept) Diagnosis

A fix?

Testing "a fix"

Closing remarks

video-archivist-bot commented Aug 17, 2021

gkatsev commented Aug 17, 2021

heennkkee commented Oct 4, 2021

stale bot commented Jan 8, 2022

EdNutting commented Jan 9, 2022

stale bot commented Apr 17, 2022

EdNutting commented Apr 18, 2022

gkatsev commented Apr 19, 2022

adrums86 commented Oct 30, 2023

EdNutting commented Aug 17, 2021 •

edited

Loading