Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Player stucks with buffer empty #1000

Open
pmendozav opened this issue Nov 17, 2020 · 14 comments
Open

Player stucks with buffer empty #1000

pmendozav opened this issue Nov 17, 2020 · 14 comments
Labels

Comments

@pmendozav
Copy link

Description

I have a live streaming with several renditions in 60fps. This is a simple description of these (id - resolution - segment size aprox)

  • A - 256x144 - 200kb
  • B - 416x234 - 320kb
  • C - 640x360 - 500kb
  • D - 960x540 - 1mb
  • E - 1280x720 - 2mb
  • F - 1280x720 - 3mb

When I tried to use the manifest I observed that player stucks observing the next situations:

  • Keeps downloading manifests: from a single variant that sometimes is not the lower
  • Keeps downloading manifests + segments: but the video does not play
    (buffer always is empty)

Some observations:

  • I found that sometimes when the video is in buffering, player tries to change the rendition but keeps selecting the same and during switching abort() from segmentLoader is called and the segments are not loaded in buffer (keeping empty).
  • Sometimes using the tool https://videojs-http-streaming.netlify.app/utils/stats/ I observed that if I force to use A/B/C the CPU in between 10-30% and heap uses 30mb aprox but if I force to use F and wait some minutes the memory demand is between 40-80% and heap grows also (without playback)
  • In the case of F (major bandwidth), Safari requires only a couple of seconds to start and never stucks

Sources

This is a sample of the live streaming I'm using:
https://hs-video-testing.s3.amazonaws.com/master_abc_60.m3u8

Steps to reproduce

  1. Go to https://videojs-http-streaming.netlify.app/utils/stats/
  2. Use https://hs-video-testing.s3.amazonaws.com/master_abc_60.m3u8
  3. Force to use F_60.m3u8
  4. Wait

Results

Expected

Player could to stuck a couple of seconds for buffering but should to recover even if the major rendition is forced to use.

Error output

No errors/warnings reported in the console

Additional Information

videojs-http-streaming version

what version of videojs-http-streaming does this occur with?

videojs version

what version of videojs does this occur with?
videojs 7.8.2, 7.8.4, 7.10.2

Browsers

what browsers are affected? please include browser and version for each

  • Chrome

Platforms

what platforms are affected? please include operating system and version or device and version for each

  • macOS Catalina

Other Plugins

are any other videojs plugins being used on the page? If so, please list them with version below.
*

Other JavaScript

are you using any other javascript libraries or frameworks on the page? if so please list them below.
*

@Gregory-Gerard
Copy link

Maybe related to my own issue? #999

Using an old version of video.js (7.2.3) "resolved" this issue in my case.

@gkatsev
Copy link
Member

gkatsev commented Nov 30, 2020

This issue (and @Gregory-Gerard's) potentially goes away with our new experimental ABR. You can try setting the experimentalBufferBasedABR option (or hit the checkbox for it on https://videojs-http-streaming.netlify.app/).
The flag enabled a timer that handles checking whether we need to switch, so, if downloads stall or something we don't just stop playback. It also does a better job and switch or not switching.

@Gregory-Gerard
Copy link

@gkatsev Just tried and player still enter in a "infinite buffering" mode, i need to click on live ui button to start again my live, after waiting 30 sec - 1 min. I provide some additional information with a screencast in #999 if you'd like.

@jengel3
Copy link

jengel3 commented Dec 1, 2020

FWIW, this was occurring for me on 7.10.2, but downgrading to 7.8.4 did resolve the issue.

@EdNutting
Copy link

EdNutting commented Aug 17, 2021

@gkatsev I believe I partially understand this issue -- and afaict the ABR option won't actually fix the cause though it may change the behaviour / reproduction.

Background

Users of @clowdr_app have been plagued by HLS streaming problems for a while now, complaining about the player hitting a loading spinner, heating up their laptops and never recovering. Only a page refresh could save them (and sometimes their entire browser crashed - seemed like network overload or OoM). I finally managed to trigger the bug for myself yesterday.

Reproduction

(It's a flaky issue - the reproduction just as much so!)

Using either the latest VideoJS package version or the sample app version (linked by the OP), without the experimental option you mentioned, I can generally trigger the infinite loading spinner by following these steps:

  1. The Apple LL HLS stream: https://ll-hls-test.apple.com/master.m3u8
  2. Start out with dev tools open, Network tab open, no throttling.
  3. Load the stream, start playing at live, and it ideally will pick the 4Mbps quality.
  4. In the dev tools, after a few seconds or any longer time, set the throttling down to somewhere between or at "Good 3G" and "Good 2G".
  5. What we've replicated is a sudden and marked reduction in network quality. This should trigger some buffering and a drop to one of the lower quality feeds in the stream.
  6. Wait. Just keep waiting. Eventually, you're going to hit an infinite loading spinner (or at least, I do, my colleagues do and so do most of our users!) You'll know you've hit it when you see the same segment being endlessly requested in the dev tools Network tab.
  7. If you now go back to No Throttling, you'll see the rate of segment loads shoots up to "too many per second". The underlying bug is a timing bug, so sometimes the stream recovers but often it doesn't.

(Inept) Diagnosis

...after poking around in a debugger for 5 hours.

As far as I can make out, here's what's happening:

  1. Initially the stream plays fine. The PlaybackWatcher checkCurrentTime_ function calls fixBadSeeks_ which finds no issues because the stream isn't buffering (the network is strong) and everything is playing out just fine.
  2. The network quality drops, buffering occurs.
  3. Now, sometimes (not always, not even often, just sometimes!) a race condition rears its ugly head.
    1. The segment finishes loading just before checkCurrentTime_ gets called.
    2. checkCurrentTime_'s timeout expires thus meaning it gets called
    3. Here's where things get funky. The segment hasn't yet been processed by the HTML video player, so the video player is still attempting to seek to the start of the stream - it reports seeking as true. During checkCurrentTime_, this results in a call to fixBadSeeks_
    4. fixBadSeeks_ sees the valid segment, but the segment took a while to load. As a result, the currentTime is now out of range of the segment. The code triggers a seek to the start or end of the new segment (and from what I've seen, it also has to be the only segment).
    5. This seek causes a seeking event to be fired by the HTML video element
    6. VideoJS picks up the seeking event but the HTML video player still hasn't had a chance to process the segment yet. As a result, the seeking event's call to setCurrentTime unwittingly blows up the loaded segment by calling abort and resetEverything because the HTML video player reports an empty buffer (and setCurrentTime assumes setting a time with an empty buffer must be invalid).
  4. Ok, so checkCurrentTime_ triggered a seek to the start or end of the segment, which in turn destroyed the segment that was just loaded that it was trying to seek to.
  5. The re-loading of the same segment begins (in response to the reset).
  6. Moments later, checkCurrentTime_ comes back around. Maybe the reloaded segment isn't there yet - nothing happens. Eventually, the segment info will be available.
    • As before, the player is seeking and fixBadSeeks_ is going to try to fix the out of sync "issue".
    • However, the currentTime has now been set by the code - we know it's going to be either the beginning or end of the segment -- except that neither is going to be a valid target!
    • Unfortunately, because of the way the conditions are structured, it's going to pick the opposite point to the previous choice of seek. I.e. if it picked "seek to start" the first time, it'll pick "seek to end" the second, and vice versa (remember, we have only a single segment to worry about in this situation so the seekable range.length is 1).
  7. Because the target time has changed, another seeking event is fired and the cycle repeats.
  8. Sometimes, if the browser slows down enough, you can actually see this happening in the UI - the time marker flip flops back and forth from one end of the time bar to the other (and the "live" indicator toggles on and off accordingly).
  9. This situation is irrecoverable for the duration of the buffer depth because the player "gets stuck" trying to reload and seek the same segment again and again. After the buffer depth, typically 60s or so, the player says "ah but that segment is now out of range entirely, move on to the next segment" and so, assuming the race condition doesn't trigger a second time, the stream may recover (if the browser or stream server haven't aborted due to the sheer volume of network requests in the meantime).

A fix?

...probably not but it's worth a shot. Maybe it'll lend some insight or give you a starting point for debugging further :)

Ooof, that was long! Properly confusing and I'm still not sure I really understand what starts this madness in the first place - it's definitely a timing/race condition (and possibly introduced by some subtle change of browser behaviour in 2020? Who knows!)

Anyway, a fix (though I doubt it's a correct/good fix) is to modify PlaybackWatcher:fixesBadSeeks_ to the following:

Original:

if (this.beforeSeekableWindow_(seekable, currentTime)) {

Modified:

if (seekable.length && seekable.length > 1 && this.beforeSeekableWindow_(seekable, currentTime)) {

This effectively says "if we're trying to buffer a single segment, never try to re-seek to its start. Only seek to its end (i.e. the live point) if needed" (i.e. only the isAfterSeekableRange condition applies).

I guess maybe there is an alternative fix for setCurrentTime that doesn't treat "set time with no buffer must be invalid" but I couldn't figure out how to get it to work nor what the side effects might be. At least the change above appears to be pretty self contained and low impact.

Testing "a fix"

I have tested this change and found that the infinite buffering does not recur and the player functions as expected. My diagnosis/analysis above is probably deeply flawed in some way (after all, it's just based on poking around with a debugger in FireFox for 5 hours - I don't pretend to understand the Video JS code!) but the fix proposed at least seems to do the trick.

Closing remarks

Anyway, I hope this helps gain some insight into (some variant of) this problem and a possible solution!

We would really appreciate a fix soon - @clowdr_app's users are not entirely happy about having to refresh the page every 25s just to get the stream to recover (and the 20s of lost viewing every time). It hits some users consistently and others not at all; and it's nothing to do with bandwidth (even streaming at 360p on a 100Mbps connection with a powerful computer some of our users and I have run into the problem!)

@video-archivist-bot
Copy link

Hey! We've detected some video files in a comment on this issue. If you'd like to permanently archive these videos and tie them to this project, a maintainer of the project can reply to this issue with the following commands:

@gkatsev
Copy link
Member

gkatsev commented Aug 17, 2021

@EdNutting Thanks for the super detailed comment. We'll definitely take a look!

@heennkkee
Copy link

I have also run into this problem, verified by using the method @EdNutting described and reproduced issue both with and without the experimentalBufferBasedABR-flag.

@stale
Copy link

stale bot commented Jan 8, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the outdated label Jan 8, 2022
@EdNutting
Copy link

Anti-bot post to stop this being prematurely closed. Just because it hasn't been fixed after so long and users are still suffering doesn't mean it's not an issue. We resorted to significantly increasing the segment duration - sacrificing latency - in order to work around this issue (as per the referenced PR from Aug 2021 above).

@stale stale bot removed the outdated label Jan 9, 2022
@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the outdated label Apr 17, 2022
@EdNutting
Copy link

Anti-bot post to stop this being prematurely closed.

@gkatsev
Copy link
Member

gkatsev commented Apr 19, 2022

When this is fixed, we should revisit #999 and #1049 to confirm they were fixed.

@adrums86
Copy link
Contributor

Can anyone confirm if this was fixed with #1433?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants