Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in videoroom with about 100 viewers after about 1 h #2034

Closed
lorenzobob0 opened this issue Mar 30, 2020 · 16 comments
Closed

Crash in videoroom with about 100 viewers after about 1 h #2034

lorenzobob0 opened this issue Mar 30, 2020 · 16 comments
Labels

Comments

@lorenzobob0
Copy link

Hello!

Thank you for your excellent work!

I am using Janus to broadcast audio and video from a live event to multiple viewers (1 to many).
Last weekend I was testing it in production with a little less that 100 viewers and I have noticed that after about 1 - 1.5 hours the janus server chrashed. This happenend again multiple times, after about the same time. When the number of viewer decreased (about 50), it did not crash.
I suspect a memory leak.

The stack trace of the debug is here:
https://pastebin.com/RFzfeXk9

Unfortunately the binary was not built with debug symbols.

Please let me know if I can help you find the problem.

Lorenzo

@lminiero
Copy link
Member

Nothing we can do with that trace. Please collect more useful info first, and make sure you're on master (there have been fixes last week).

@groupboard
Copy link
Contributor

Check ulimit -a and ulimit -n. By default they are quite low on linux, and janus will crash when it hits a limit.

@lminiero
Copy link
Member

lminiero commented Apr 8, 2020

Any update on this? Without debugging information there's nothing we can do. I'll interpret lack of feedback as an implicit confirmation this is fixed and I'll simply close.

@lorenzobob0
Copy link
Author

lorenzobob0 commented Apr 8, 2020 via email

@lminiero
Copy link
Member

lminiero commented Apr 8, 2020

Please make sure to test whatever will be master, in three weeks, as good chances are we'll have applied fixes in the meanwhile. As a side note, for many viewers the VideoRoom may not be the best option, especially if it's one to many: it may be better to RTP-forward to the Streaming plugin, and use helper threads there.

@lorenzobob0
Copy link
Author

lorenzobob0 commented Apr 9, 2020 via email

@wmajerski
Copy link

Hi guys, I've been experiencing exactly the same issue for several weeks now using the most recent Janus version (tried v0.9.0, v0.9.1 and v0.9.2). Janus crashes as soon as the number of video room participants gets more than 100-120 (hard to say the exact number). I will try to collect more debug info on the next occasion.

groupboard: is there any other ulimit value that needs to be bumped other than open files limit (ulimit -n) which indeed is quite low by default (1024 on Debian). I increased it to 10k during the initial setup so at least in my case it's not the issue.

@groupboard
Copy link
Contributor

The only ulimits I had problems with were ulimit -u and ulimit -n. Both were quite low in Centos.

@lminiero
Copy link
Member

"Janus crashes" is in no way helpful to us. Please provide what I already asked Lorenzo for.

@lminiero
Copy link
Member

lminiero commented Apr 21, 2020

One week without info? I don't have a magic wand. Please provide feedback if you need us to fix it, or I'll just close and amen. I won't accept the «I'm busy» card, or I'll have to assume you don't value my time enough. Thanks.

@lorenzobob0
Copy link
Author

lorenzobob0 commented Apr 21, 2020 via email

@lminiero
Copy link
Member

Just merged a fix in #2093 for an issue reported in #2087, that was likely related to the problem you're experiencing. As such, I'll close. If in two weeks it's still an issue, we can discuss reopening.

@wmajerski
Copy link

Hello guys, sorry for a delayed reply but only today we had an event for such a big number of participants - 140 at its peak. There were several crashes during the event but logs show nothing more than a simple 'Killed' line:

https://pastebin.com/ATn3GCAe
https://pastebin.com/nRJQGfT8
https://pastebin.com/5U8Ne8EP
https://pastebin.com/jjV723ir

By using dmesg command I was able to find out why the process was killed by kernel (notice the last two lines) and it was due to the memory leak:
https://pastebin.com/kF8ps7cG

In the logs I didn't see any Address Sanitizer information unless I shut down Janus manually while it was working - then I got some extra information:
https://pastebin.com/QTTPYyJ7

I've also got a GDB backtrace but honestly, it doesn't tell me much:
https://pastebin.com/2Q9bUGaL

I'm using the most recent version compiled today morning from the master branch. The OS is Ubuntu 18. It's also worth mentioning that most of our events engage less than 100 participants (mostly around 60-80) and there are no issues/crashes reported at all - everything runs smoothly. Any help or advice will be highly appreciated. There is going to be a similiarly crowded event in the upcoming days so I will be surely able to provide more debug information.

Thank you.

@lorenzobob0
Copy link
Author

Hello,

As promised I am providing more information about the issue.

I checked out the project on Friday and I built it with debug and AddressSanitizer.

Janus crashes randomly with about 100 partecipants.

I am adding two consecutive crash logs.
I hope this helps. Thank you for your great project.

https://pastebin.com/wurzP0h0

https://pastebin.com/XYhNtXFV

@alexamirante
Copy link
Member

@lorenzobob0 doesn't look like you're on master, as line numbers don't match. Looks like you're somewhere between v0.9.2 and v0.9.3. From your second pastebin I see your server was being scanned by BOTs, so it likely received an unsupported HTTP request that made it crash. This has been fixed in 436cccc.

@elhadede
Copy link

thanks everyone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants