Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

janus_videoroom_hangup_subscriber segfaults with lots of publishers / subscribers leaving at the same time #2087

Closed
cb22 opened this issue Apr 19, 2020 · 6 comments · Fixed by #2093

Comments

@cb22
Copy link
Contributor

cb22 commented Apr 19, 2020

I'm not sure if this is related to #2034, but it looks like there's quite a specific setup required to trigger this bug and I can't see any information there linking the two besides both involving the videoroom plugin.

The setup:

  • Latest janus master
  • 8 event loop threads
  • private_ids on (mentioned due to the stacktrace)
  • Lots of active rooms, with publishers / subscribers leaving due to being killed (not leaving gracefully)

Janus segfaults when a large amount of participants leave the videoroom (stacktrace). My testing setup here consists of ~100x Google Cloud VM instances, each running 3x Chrome instances per videoroom, to two janus servers.

I can seemingly replicate this crash consistently. Interesting to note, is that the crash doesn't seem to happen if I turn off a fixed number of event loop threads, nor if the publishers / subscribers send leave requests. I'm not sure if those two points might just be a red herring due to an underlying race condition though.

The specific line mentioned in the stacktrace is here

Some speculation (I'm not familiar with GLib!) is perhaps there's a TOUTOC between the check for s->room != NULL and the usage of g_hash_table_lookup(s->room->private_ids. Anyhow, I'm going to do some more invetigation, but I'm happy to provide any more info or run tests as required!

@lminiero
Copy link
Member

I had to google TOUTOC to see what it meant... 🤭
A libasan dump might help, in this case, as that would tell if that pointer was freed somewhere or just set to NULL.

@cb22
Copy link
Contributor Author

cb22 commented Apr 20, 2020

Managed to recreate it with a simpler setup than before - 1 publisher and 8 subscribers in the same room will do it fairly repeatedly, as long as the subscribers all leave at the same time (killall -9 chromium does the job quite well if they're all on the same machine)

I've got a libasan dump (here) but it's not going to be too useful for others since I've added a few debugging statements and extra code to narrow down the issue (such as an extra check on s->room != NULL).

It looks like what happens is somehow janus_videoroom_hangup_subscriber gets called with the same s twice in succession (verified by adding in some extra logging). The first runs successfully, but when the second runs, the state has been cleaned up, which is why trying to unlock the mutex, or do a lookup on s->room->private_ids fails.

If I add a simple if guard after janus_mutex_lock, it seems to fix the bug (see here). I can no longer reproduce it on either my local testing, or on the larger testing setup with ~100 peers even after multiple runs. This does feel like a bit of hack though, so I'd appreciate any feedback!

@lminiero
Copy link
Member

It still might be a bit frail, though. A potentially better solution might be to assign the s->room property to a dedicated variable, e.g.:

janus_videoroom_room *room = s->room;

incref that if null, use that for the checks and work, and then decref it when done. This way, it doesn't really matter whether s->room itself is NULL or not, since we only use the room info (private IDs) to update the publisher. If that makes sense I can prepare a PR later.

@lminiero
Copy link
Member

@cb22 can you let me know if the PR above helps?

@cb22
Copy link
Contributor Author

cb22 commented Apr 22, 2020

@lminiero your solution sounds much better to me indeed. I've given it some testing, and I can't reproduce the crash anymore - so LGTM 👍

@lminiero
Copy link
Member

Thanks! I'll merge then 👍

voicenter added a commit to voicenter/janus-gateway that referenced this issue Apr 24, 2020
* Updated link to project in resources (docs)

* Add exception var to catch stmt to fix rollup (meetecho#1848)

* Fixed typo

* fix nullptr dereference in streaming plugin (meetecho#1855)

* VP9 SVC fixes (meetecho#1849)

* Fixed SIP hangup not sending CANCEL, when inviting (fixes meetecho#1856)

* Use strtol more, and add checks when atoi is used (meetecho#1852)

* Fixed broken code in AudioBridge

* Fixed regression when setting up DataChannels

* Fix RTP fuzzing target according to recent VP9 changes.

* Fixed rare race condition in HTTP plugin that could cause leak (fixes meetecho#1665)

* add missing closing curly bracket (meetecho#1859)

* Don't scan libnice version if it wasn't retrieved (fixes meetecho#1858)

* Fixed wrong clock rate being used for RTP header updates when using G.722

* Feature/ignore unreachable ice server (meetecho#1854)

* Keep track of clock rates associated to payload types, for RTCP

* Don't send RTCP SR if outgoing media has been disabled via SDP update

* Bumped version in postprocessing tool as well

* Fixes to RTSP latching procedure (fixes meetecho#1536, replaces meetecho#1851) (meetecho#1866)

* New functionality to add custom Contact URI params to SIP REGISTER (meetecho#1874)

* Reduced verbosity of some lines in the SIP plugin

* Reduced default twcc_period value from 1s to 200ms

* SIP plugin: custom (non-standard) headers on incoming events (requests) (meetecho#1873)

* Bumped to version 0.8.0

* Gzip compression utility in the core (and sample event handler) (meetecho#1846)

* New category of plugins for modular logging (meetecho#1814)

* Fixed linking error for post-rocessing tools after recent changes

* Remove option to enable rtx (now always supported, when negotiated) (meetecho#1877)

* Updated documentation to include some info on the new logger modules

* Avoid gzip functions when fuzzing in OSS and add zlib dependency when fuzzing locally.

* Fixed exception to GPL code (see meetecho#713)

* Fixed wrong default folder for loggers

* Added link to new video on Simulcast and SVC to docs

* Add CHANGELOG.md file into the project (meetecho#1885)

* Fix RTSP SETUP when url includes query string parameters (fixes meetecho#1869) (meetecho#1875)

* Added changelog (and info on tagged versions) to documentation

* [Suggestion] Started the refactoring of the janus.js (meetecho#1830)

* Make sure libcurl is available before using CURL_AT_LEAST_VERSION (fixes meetecho#1887)

* Fixed small typos in demos

* Fixed obsolete value for TWCC period default in docs/hints

* Make sure the installed libcurl knows about CURL_AT_LEAST_VERSION

* Fixed variable shadowing

* Added fwrite checks in record.c (warnings only)

* Updated changelog (v0.8.0)

* Bumped to version 0.8.1

* Remove SIPre plugin from the repo (meetecho#1894)

* Binary data support in data channels (meetecho#1878)

* Fixed typo in SIP plugin

* Allow RTCP ports to be picked randomly using 0, in Streaming plugin

* Check if rtcp port is > 0 before creating a RTCP socket.

* Revert "Check if rtcp port is > 0 before creating a RTCP socket."

This reverts commit a0b7dbf.

* Check if rtcp port is > 0 before creating a RTCP socket, in Videoroom plugin.

* Add in mountpoint/forwarder create response the allocated RTCP ports.

* he 'referred_by' field currently holds the SIP URI value copied from the (meetecho#1896)

* Fixed warnings introduced in meetecho#1896

* Fixed leak in SIP plugin (fixes meetecho#1897)

* Fixed occasional memory leak in Streaming plugin (fixes meetecho#1900)

* Fix out of bounds array access for last_spatial_layer (meetecho#1906)

* startup: only close the logger directory if it was opened (meetecho#1903)

* Only close the event handlers directory if it was opened (see meetecho#1903)

* fixed typo (meetecho#1916)

* Move loggers cleanup to end of logger thread (fixes meetecho#1904)

* Fixed late initialization of janus.js constructor callbacks (fixes meetecho#1912)

* Added reference to Snap repo in resources (docs)

* Fixed warnings when building DTLS bio code

* Don't keep TextRoom plugin loaded if data channels were not compiled

* Updated year in demos and docs

* Use sendBeacon instead of sync XHR in onbeforeunload (fixes meetecho#1902) (meetecho#1918)

* Fixed occasional buffer overflow error when post-processing H.264 recordings

* Increase buffer when post-processing VP8/VP9 recordings too (see previous commit)

* Updated Changelog

* Bumped to version 0.8.2

* Fix a possible race condition when joining as a subscriber and destroying the session. (meetecho#1911)

* More verbose output on postprocessing output error

* Fixed reference to deprecated configuration file

* Added check on AudioBridge instance in setup_media (fixes meetecho#1923)

* Added missing check on SDP attribute value existence

* Add new configuration property to add protected folders not to save to (meetecho#1919)

* Fixed undefined reference when building postprocessor utilities

* Better parsing of RTSP messages (see meetecho#1922) (meetecho#1925)

* Fixed undefined reference when building fuzzers

* Add missing mutex unlocks in videoroom message handler.

* Add math library when fuzzing locally.

* Add audio skew compensation to janus-pp-rec. (meetecho#1870)

* Updated man file for janus-pp-rec

* Remove odd respond to automatically responded OPTIONS request (meetecho#1930)

* Fix g_async_queue usage (meetecho#1929)

* typo (meetecho#1934)

AudioBridge documentation typo in request mute|unmute

* Fixed broken links in docs (plugins list)

* Removed deprecated warning in screensharing demo

* Removed deprecated text from screensharing demo

* Fixed helpers not being able to send SUBSCRIBE requests in SIP plugin

* Small tweaks after static analysis

* Added Coverity badge

* Janus Travis CI integration (meetecho#1932)

* Updated Changelog (0.8.2)

* Bumped to version 0.9.0

* Refactoring of core-plugin callbacks and RTP extensions termination (meetecho#1884)

* Support for transport-wide CC on outgoing streams (meetecho#1889)

* Dynamically update NACK queue size depending on RTT (meetecho#1867)

* Fixed broken RTP fuzzer

* Fixed typo when adding audio attribute to SDP

* Fixed RTCP parsing issue found by OSS-fuzz

* Fix volume-related functions in janus.js (meetecho#1935)

* Fixed leak when parsing broken TWCC RTCP message (Credit to OSS-Fuzz)

* Add travis_retry to git clone commands.

* Fixed occasional segfault when parsing TWCC RTCP message (Credit to OSS-Fuzz)

* Add OSS-Fuzz badge.

* Fixed regression on video bitrates when using monodirectional PeerConnections

* Update janus_audiobridge.c (meetecho#1938)

The target of participant should also acknowledge the latest mute/unmute status which has been made by administrator.

* Travis libnice clang flags (meetecho#1941)

Do not check cast-alignment errors when compiling libnice with clang.

* Fixed occasional error messages on console when trying to add RTP extensions

* Update debugging section in Janus documentation.

* Optimized parsing of TWCC RTCP message (Credit to OSS-Fuzz)

* Renamed corpora file

* Avoid RTP header memory misalignment in rtx packets (meetecho#1943)

* We should allow to have ICE-TCP enabled without ICE Lite. Recent versions of libnice allow this combination and gather tcp passive candidates etc. in this setup. (meetecho#1946)

* conf: transports: document events option (meetecho#1952)

* Updated Changelog (0.9.0)

* Bumped to version 0.9.1

* Configurable global prefix for log lines (meetecho#1940)

* add missing callbacks.error check (meetecho#1959)

* janus_sip: add missing check for NULL (meetecho#1963)

Fixes meetecho#1962

* Remove Sofia reference from the title of the SIP demo

* rtp: drop dead code in rtp_header_update callers (meetecho#1964)

* Subtype for some event, and better docs for event handlers (fixes meetecho#1953) (meetecho#1957)

* Added link to new event handlers documentation to the doc main page

* Removed unused variables

* Added license badge to the README

* Small tweaks to demo intro text

* Detect H264 key frames with smaller SPS units (meetecho#1965)

Reduces the H264 keyframe length check from 16 to 6 bytes.
6 bytes seems to be the lower bound of any possibly valid SPS NAL unit,
based on Section 7.3 of the H264 specification.

For reference, we have been observing Chrome 80 producing SPS units
of 12 bytes or less.

* Support for strings as unique IDs in AudioBridge, VideoRoom, TextRoom (meetecho#1880)

* If glib is too old, generate uuid manually when needed (see meetecho#1880)

* Fixed errors creating VideoRoom when strings are used (see meetecho#1880)

* Remove duplicated codecs when answering SIP call (meetecho#1966)

* Fixed a couple of JSON attributes in VideoRoom when strings are used (see meetecho#1880)

* Make sure a publisher exists when asking for a VideoRoom subscriber renegotiation (fixes meetecho#1970)

* Added errno info when socket operations fail in Streaming plugin

* Fixed typos in TextRoom

* Support for strings as unique mountpoint IDs in Streaming plugin (meetecho#1969)

* fix meetecho#1967 (meetecho#1968)

Fixed error callback not being invoked when an HTTP error happens trying to attach to a plugin

* Added checks on nice_address_set_from_string (fixes meetecho#1973)

* Fixed broken method signature in Streaming plugin when not using libcurl

* Remove /root from the list of protected folders. Make comment text more clear.

* Valgrind fixes for sockaddr structs (meetecho#1976)

Avoid use of uninitialized members

* Hide libcurl from pkg-config when testing travis-ci with LIBCURL = NO.

* Fixed leak when creating Streaming mountpoint dynamically

* Reduced log level to info when logger and event handlers are not found (meetecho#1980)

* Always use base SSRC when recording VideoRoom simulcast participant

* Removed wrong comment

* Fixed broken DTMF in SIP demo

* Add UI to SIP demo to remove helpers, when created

* Fixed occasional missing referred-by info in SIP demo

* Reply to incoming REFER with 202 right away, not 100, in SIP plugin

* Added more checks on nice_address_set_from_string (fixes meetecho#1973) (meetecho#1981)

* Several enhancements to SIP demo

* Fixed abort at server shutdown after using SIP transfers

* Fixed typo in SIP demo code

* Updated Changelog (0.9.1)

* Bumped to version 0.9.2

* Make prebuffering in AudioBridge configurable (meetecho#1975)

* Add G.711 support to the AudioBridge plugin (meetecho#1979)

* Added maximum value for AudioBridge prebuffering property

* Converted HTTP transport plugin to single thread (meetecho#1173)

* Added -f to rm in html Makefile.am (fixes meetecho#1985)

* Small fixes for TypeScript declaration file (meetecho#1986)

Based on the current RTCConfiguration spec (https://w3c.github.io/webrtc-pc/#dom-rtcconfiguration), iceServers does not expect an array of strings.
Updating to type provided by TypeScript's lib.dom.d.ts

* ice: ensure that stream is non-NULL (meetecho#1987)

This fixes a crash on later stream checks (e.g., transport_wide_cc et al).

* Fixed typo in querylogger_parameters (copy/paste error) (meetecho#1989)

* Fixed double unlock when listing private rooms in AudioBridge (meetecho#1988)

* Make sure the session still has a reference when cleaning up HTTP requests

* Fixes to leaks and race conditions in VoiceMail plugin (meetecho#1993)

* Several fixes to session management in VideoCall plugin (meetecho#1994)

* update dtls ciphers (meetecho#1995)

* Implement ECDSA Certificate generation (meetecho#1997)

* Small tweaks to meetecho#1997 (renamed, moved and documented RSA property in janus.jcfg)

* Fix rare race condition when claiming sessions (meetecho#1990)

* Fix occasional deadlock in VideoRoom (2) (credits to @mivuDing, fixes meetecho#1982) (meetecho#1984)

* Added option to enforce validation on DTLS certificates (meetecho#1992)

Made DTLS ciphers configurable as well

* Fixed typo when renegotiating audio in janus.js (fixes meetecho#2002)

* Added option to ignore mDNS candidates (meetecho#1998)

* Fixed deadlock when using claim on HTTP transport (fixes meetecho#2000)

* Support for RTSP 'Content-Base' header in Streaming plugin (meetecho#1999)

* Added link to FOSDEM 2020 talk on RTP forwarders to the docs

* Fixed small leak in SIP plugin when holding calls

* Added called URI to 'incomingcall' and 'missed_call' events in SIP plugin

* Add repos for openSUSE and SUSE (meetecho#2009)

* Use user_id_str for kicked, leaving, and unpublished events, if enabled. (meetecho#2010)

Co-authored-by: Michael Shiel <mshiel@icehealthsystems.com>

* http_transport: add NULL checks (meetecho#2012)

Refs meetecho#2005

* Update media direction in SIP plugin if remote address is 0.0.0.0 ('hold' fix) (meetecho#2013)

* Prepare RTCP Sender Reports by considering the last RTP timestamp sent. (meetecho#2007)

* Track pending nack cleanup tasks and cancel them when freeing a stream. (meetecho#2014)

* Fixed typo in janus.js error code (fixes meetecho#2018

* Reverted change on janus.js (see meetecho#2018)

* Resolve mDNS candidates asynchronously with GResolver (see meetecho#1998) (meetecho#2004)

* Reference count janus_request instances (meetecho#2020)

Added better management of refcount on HTTP session when using it too, and refcount support to hanus_http_msg as well

* Updates to mutex unlocking in textroom and videoroom plugins (meetecho#2026)

* Updated Changelog (0.9.2)

* Bumped to version 0.9.3

* Add Python aiortc-based functional testing. (meetecho#1971)

* test_aiortc: cleanup (meetecho#2027)

* Fixed missing refcount init for Admin API (fixes meetecho#2029)

* Bumping back to 0.9.2 to re-tag

* Updated changelog for 0.9.2

* Bumped to version 0.9.3 (again)

* janus_http: return earlier if request is NULL (meetecho#2031)

* Fixed janus-pp-rec build warnings when using ffmpeg >= 4.x

* Fixed VideoRoom destroy not working when using strings

* Fixed av_register_all deprecation check in post-processor

* plugins: drop tautology (meetecho#2041)

gateway is always set before initialized, so the latter is always true.

* Don't set ICE credentials when parsing remote credentials (meetecho#2046)

* Detect libsrtp(2) using pkg-config (fixes meetecho#2019) (meetecho#2033)

* Added support for static Opus files to Streaming plugin (meetecho#2040)

* Added support for generic metadata to Streaming mountpoints

* Fixed printout of metadata in Streaming demo

* Added notes on building libsrtp (see meetecho#2024)

* Add configurable DSCP ToS for PeerConnections (meetecho#2055)

* Always add remote candidates from the libnice loop (see meetecho#2045) (meetecho#2048)

* Fixed Streaming destroy not working when using strings

* Use refcount for Streaming plugin helper threads (meetecho#2039)

* Added option to disable building AES-GCM support (see meetecho#2024 and meetecho#2054)

* Fixed typo

* Fixed outdated info in VideoRoom docs

* Fixed syntax error in sample Streaming plugin configuration file

* Support for additional constraints on screenshare media (meetecho#2043)

* refactoring-clean up (const-var, semicolons, ===, etc.) (meetecho#2044)

* Reference subscriber when handling related messages (see meetecho#2045) (meetecho#2061)

* Added option to configure time needed to detect a missing simulcast substream (meetecho#2063)

* Reverted isTrickleEnabled check in janus.js (fixes meetecho#2064)

* Don't show warnings for rtx RTCP packets

* Made libnice warning clearer, and upped suggested version (fixes meetecho#2069)

* Add missing info to videoroom "list" response (meetecho#2068)

* Use custom GSource to handle HTTP request timeouts (see meetecho#2062 and meetecho#2066) (meetecho#2075)

* Define the libnice version string as extern in version.h (fixes gcc10 error)

* Fixed AudioBridge create API not working properly when using string IDs

* Fixed a few typos in AudioBridge errors

* Fix copy-paste error in Streaming plugin docs

* Fix libasan use after free in janus_videoroom_handler when events are enabled (meetecho#2091)

* Added project to resources in the docs

* Return mountpoint IP addresses, if a bind interface/IP was provided

* Swap RR/SR Report Blocks if the first block contains rtx data. (meetecho#2089)

* Add support for playback of audio files in AudioBridge (meetecho#2088)

* Updated Changelog (0.9.3)

* Bumped to version 0.9.4

* Fixed returned address when adding multicast Streaming mountpoints

* More checks when hanging up VideoRoom subscriber (see meetecho#2087) (meetecho#2093)

* Added new docker image to the resources in the docs

* Updated AudioBridge documentation with new playback feature

* Don't wait forever for candidates when half-trickling

* Add some missing static declarations to HTTP and WS transports.

Co-authored-by: Lorenzo Miniero <lminiero@gmail.com>
Co-authored-by: Agustin Polo <poloagustin@gmail.com>
Co-authored-by: Yongje Lee <yongje.lee@hpcnt.com>
Co-authored-by: Alessandro Toppi <atoppi@meetecho.com>
Co-authored-by: Sebastian Schmid <sebastian.j.kummer@gmail.com>
Co-authored-by: Imer Husejnovic <imer90@gmail.com>
Co-authored-by: Oscar <oscar.vadillog@gmail.com>
Co-authored-by: Irek <34670509+pawnnail@users.noreply.github.com>
Co-authored-by: Tristan Matthews <tmatth@videolan.org>
Co-authored-by: Jon Rafkind <jon@rafkind.com>
Co-authored-by: kuekerino <20779891+kuekerino@users.noreply.github.com>
Co-authored-by: Yurii Cherniavskyi <yurii.cherniavskyi@gmail.com>
Co-authored-by: Meirza Arson <klanjabrik@gmail.com>
Co-authored-by: Groupboard <davidj@groupboard.com>
Co-authored-by: Cameron Lucas <clucas@clucas.info>
Co-authored-by: hxl-dy <hexulei@dyinnovations.com>
Co-authored-by: Alessandro Amirante <alex@meetecho.com>
Co-authored-by: mp16 <51138229+mp16@users.noreply.github.com>
Co-authored-by: Paul Zhang <pszhang92@gmail.com>
Co-authored-by: Philipp Hancke <fippo@goodadvice.pages.de>
Co-authored-by: Sean DuBois <sean@pion.ly>
Co-authored-by: Ancor Gonzalez Sosa <ancor@suse.de>
Co-authored-by: Michael Shiel <michaelshiel@users.noreply.github.com>
Co-authored-by: Michael Shiel <mshiel@icehealthsystems.com>
Co-authored-by: agclark81 <agclark@technolutions.com>
Co-authored-by: Alex Pavlov <alien.pavlov@gmail.com>
Co-authored-by: alexamirante <alexamirante@users.noreply.github.com>
Co-authored-by: Federico Lorenzi <florenzi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants