Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a deadlock when janus_videoroom_destroy_session #1982

Closed
mivuDing opened this issue Mar 9, 2020 · 16 comments
Closed

a deadlock when janus_videoroom_destroy_session #1982

mivuDing opened this issue Mar 9, 2020 · 16 comments

Comments

@mivuDing
Copy link

mivuDing commented Mar 9, 2020

Hi, I found a possible deadlock when "publisher A" enter janus_videoroom_destroy_session, and "subscriber B" subscribe "publisher A" at the same time. The detail step can be seen by the attach.
videoroom_deadlock
How to solve the problem?I want to unlock the sessions_mutex before destroy publisher, is it OK?
modify
Could you do me a favor? Thanks very much!

@lminiero
Copy link
Member

lminiero commented Mar 9, 2020

Good catch, that's indeed a nasty deadlock that could happen under rare circumstances, due to the fact that the mutexes may be locked in different order there. Your proposed fix sounds reasonable to me, since the main purpose of sessions_mutex is protecting the sessions hashtable, and other resources should be protected by other locks after that. If you found an easy way to replicate the issue, you can check if you notice any regression there.

Of course, a pull request with this fix would be welcome, especially since it would give you proper credit for the patch! If you want me to do the commit instead, please let me know.

@mivuDing
Copy link
Author

mivuDing commented Mar 9, 2020

Thanks for your reply and suggestions! It is very hard to replicate the issue! The janus server is running nearly one month, and the deadlock only occurred twice.

I will take my proposed method and observe it for a period of time. I would tell you the result if it really take effect and has no side effect.

Of course if you get a good ideas, it is my great pleasure if you let me know, thank you very much!

@mp16
Copy link
Contributor

mp16 commented Mar 9, 2020

It can explain my problem described here #1974 (comment) few days ago, where Janus is still accepting connection, but no more data are send.

Since Janus run with gdb, and since I had this bug few minutes ago, you gave me the idea to dump all threads: https://pastebin.com/xTgfkRjA (built from commit 9386afb)

47 threads are stuck in janus_videoroom_hangup_media:

janus_mutex_lock(&sessions_mutex);

6 threads are stuck in janus_videoroom_slow_link:

janus_mutex_lock(&sessions_mutex);

1 thread is stuck in janus_videoroom_handler:

janus_mutex_lock(&sessions_mutex);

1 thread is stuck in janus_videoroom_create_session:

janus_mutex_lock(&sessions_mutex);

I think it's related to this issue because it's also in the videoroom plugin, and related to the session mutex.

So, a fix is more than welcome because I had the issue friday, I have it also today... it starts to be anoying. I hope that you will push soon.

@lminiero
Copy link
Member

lminiero commented Mar 9, 2020

@mp16 a fix has already been suggested by @mivuDing (check his last image), so you can try to apply it manually in your environment to check if you notice any issues/regressions.

mp16 pushed a commit to mp16/janus-gateway that referenced this issue Mar 9, 2020
@mp16
Copy link
Contributor

mp16 commented Mar 9, 2020

I have seen that image, but since I don't really know the code, it's better for me to have a diff. So, here one: mp16@c41fa14

I have added a "else" in order to not unlock twice the mutex. Is it ok ?

@lminiero
Copy link
Member

lminiero commented Mar 9, 2020

No, that's not correct. I'll prepare a PR.

@mp16
Copy link
Contributor

mp16 commented Mar 9, 2020

Sorry, I misread it. I think I get it this time: https://github.com/meetecho/janus-gateway/compare/master...mp16:fix_1982?expand=1

@mp16
Copy link
Contributor

mp16 commented Mar 9, 2020

Thanks.

@lminiero
Copy link
Member

lminiero commented Mar 9, 2020

I created a PR with the fix, so please test that to see if it works as expected for you.

@mp16
Copy link
Contributor

mp16 commented Mar 9, 2020

I will, thanks.

@mivuDing
Copy link
Author

@mp16 Maybe, I think the same fix should be done in janus_videoroom_setup_media and janus_videoroom_hangup_media, please see the follow attach. I hope it can works for you.
videoroom_update.zip

@lminiero Can you help me to check the diff patch of the attach? Is it OK? Thanks very much!

@lminiero
Copy link
Member

@mivuDing sorry but I won't open a zip file. Please share a diff, or open a pull request with the proposed changes.

mp16 pushed a commit to mp16/janus-gateway that referenced this issue Mar 10, 2020
@mp16
Copy link
Contributor

mp16 commented Mar 10, 2020

Deadlock again this morning https://pastebin.com/EJK6Fas9 after like 12 hours of execution.

Stuck again in janus_videoroom_hangup_media, janus_videoroom_slow_link and janus_videoroom_hangup_subscriber.

@mivuDing I agree, I think there is deadlock in others functions too. I have created a pull request by using your zip file. I will try your fix after the review by @lminiero

@mivuDing
Copy link
Author

@mp16 Thanks for the PR. If the fix work for you or didn't work, please let me know.
@lminiero Please help to review the PR, which mp16 had been pushed. Thanks a lot!

@lminiero
Copy link
Member

@mivuDing as @atoppi said, this looks fine, but we'll need some more testing before we can merge. About that, please sign our CLA, since even if someone else created the PR for you, the contribution was actually yours, and there are credits to you in the commit.

@mivuDing
Copy link
Author

@lminiero I have signed the CLA, thanks very much!

voicenter added a commit to voicenter/janus-gateway that referenced this issue Apr 24, 2020
* Updated link to project in resources (docs)

* Add exception var to catch stmt to fix rollup (meetecho#1848)

* Fixed typo

* fix nullptr dereference in streaming plugin (meetecho#1855)

* VP9 SVC fixes (meetecho#1849)

* Fixed SIP hangup not sending CANCEL, when inviting (fixes meetecho#1856)

* Use strtol more, and add checks when atoi is used (meetecho#1852)

* Fixed broken code in AudioBridge

* Fixed regression when setting up DataChannels

* Fix RTP fuzzing target according to recent VP9 changes.

* Fixed rare race condition in HTTP plugin that could cause leak (fixes meetecho#1665)

* add missing closing curly bracket (meetecho#1859)

* Don't scan libnice version if it wasn't retrieved (fixes meetecho#1858)

* Fixed wrong clock rate being used for RTP header updates when using G.722

* Feature/ignore unreachable ice server (meetecho#1854)

* Keep track of clock rates associated to payload types, for RTCP

* Don't send RTCP SR if outgoing media has been disabled via SDP update

* Bumped version in postprocessing tool as well

* Fixes to RTSP latching procedure (fixes meetecho#1536, replaces meetecho#1851) (meetecho#1866)

* New functionality to add custom Contact URI params to SIP REGISTER (meetecho#1874)

* Reduced verbosity of some lines in the SIP plugin

* Reduced default twcc_period value from 1s to 200ms

* SIP plugin: custom (non-standard) headers on incoming events (requests) (meetecho#1873)

* Bumped to version 0.8.0

* Gzip compression utility in the core (and sample event handler) (meetecho#1846)

* New category of plugins for modular logging (meetecho#1814)

* Fixed linking error for post-rocessing tools after recent changes

* Remove option to enable rtx (now always supported, when negotiated) (meetecho#1877)

* Updated documentation to include some info on the new logger modules

* Avoid gzip functions when fuzzing in OSS and add zlib dependency when fuzzing locally.

* Fixed exception to GPL code (see meetecho#713)

* Fixed wrong default folder for loggers

* Added link to new video on Simulcast and SVC to docs

* Add CHANGELOG.md file into the project (meetecho#1885)

* Fix RTSP SETUP when url includes query string parameters (fixes meetecho#1869) (meetecho#1875)

* Added changelog (and info on tagged versions) to documentation

* [Suggestion] Started the refactoring of the janus.js (meetecho#1830)

* Make sure libcurl is available before using CURL_AT_LEAST_VERSION (fixes meetecho#1887)

* Fixed small typos in demos

* Fixed obsolete value for TWCC period default in docs/hints

* Make sure the installed libcurl knows about CURL_AT_LEAST_VERSION

* Fixed variable shadowing

* Added fwrite checks in record.c (warnings only)

* Updated changelog (v0.8.0)

* Bumped to version 0.8.1

* Remove SIPre plugin from the repo (meetecho#1894)

* Binary data support in data channels (meetecho#1878)

* Fixed typo in SIP plugin

* Allow RTCP ports to be picked randomly using 0, in Streaming plugin

* Check if rtcp port is > 0 before creating a RTCP socket.

* Revert "Check if rtcp port is > 0 before creating a RTCP socket."

This reverts commit a0b7dbf.

* Check if rtcp port is > 0 before creating a RTCP socket, in Videoroom plugin.

* Add in mountpoint/forwarder create response the allocated RTCP ports.

* he 'referred_by' field currently holds the SIP URI value copied from the (meetecho#1896)

* Fixed warnings introduced in meetecho#1896

* Fixed leak in SIP plugin (fixes meetecho#1897)

* Fixed occasional memory leak in Streaming plugin (fixes meetecho#1900)

* Fix out of bounds array access for last_spatial_layer (meetecho#1906)

* startup: only close the logger directory if it was opened (meetecho#1903)

* Only close the event handlers directory if it was opened (see meetecho#1903)

* fixed typo (meetecho#1916)

* Move loggers cleanup to end of logger thread (fixes meetecho#1904)

* Fixed late initialization of janus.js constructor callbacks (fixes meetecho#1912)

* Added reference to Snap repo in resources (docs)

* Fixed warnings when building DTLS bio code

* Don't keep TextRoom plugin loaded if data channels were not compiled

* Updated year in demos and docs

* Use sendBeacon instead of sync XHR in onbeforeunload (fixes meetecho#1902) (meetecho#1918)

* Fixed occasional buffer overflow error when post-processing H.264 recordings

* Increase buffer when post-processing VP8/VP9 recordings too (see previous commit)

* Updated Changelog

* Bumped to version 0.8.2

* Fix a possible race condition when joining as a subscriber and destroying the session. (meetecho#1911)

* More verbose output on postprocessing output error

* Fixed reference to deprecated configuration file

* Added check on AudioBridge instance in setup_media (fixes meetecho#1923)

* Added missing check on SDP attribute value existence

* Add new configuration property to add protected folders not to save to (meetecho#1919)

* Fixed undefined reference when building postprocessor utilities

* Better parsing of RTSP messages (see meetecho#1922) (meetecho#1925)

* Fixed undefined reference when building fuzzers

* Add missing mutex unlocks in videoroom message handler.

* Add math library when fuzzing locally.

* Add audio skew compensation to janus-pp-rec. (meetecho#1870)

* Updated man file for janus-pp-rec

* Remove odd respond to automatically responded OPTIONS request (meetecho#1930)

* Fix g_async_queue usage (meetecho#1929)

* typo (meetecho#1934)

AudioBridge documentation typo in request mute|unmute

* Fixed broken links in docs (plugins list)

* Removed deprecated warning in screensharing demo

* Removed deprecated text from screensharing demo

* Fixed helpers not being able to send SUBSCRIBE requests in SIP plugin

* Small tweaks after static analysis

* Added Coverity badge

* Janus Travis CI integration (meetecho#1932)

* Updated Changelog (0.8.2)

* Bumped to version 0.9.0

* Refactoring of core-plugin callbacks and RTP extensions termination (meetecho#1884)

* Support for transport-wide CC on outgoing streams (meetecho#1889)

* Dynamically update NACK queue size depending on RTT (meetecho#1867)

* Fixed broken RTP fuzzer

* Fixed typo when adding audio attribute to SDP

* Fixed RTCP parsing issue found by OSS-fuzz

* Fix volume-related functions in janus.js (meetecho#1935)

* Fixed leak when parsing broken TWCC RTCP message (Credit to OSS-Fuzz)

* Add travis_retry to git clone commands.

* Fixed occasional segfault when parsing TWCC RTCP message (Credit to OSS-Fuzz)

* Add OSS-Fuzz badge.

* Fixed regression on video bitrates when using monodirectional PeerConnections

* Update janus_audiobridge.c (meetecho#1938)

The target of participant should also acknowledge the latest mute/unmute status which has been made by administrator.

* Travis libnice clang flags (meetecho#1941)

Do not check cast-alignment errors when compiling libnice with clang.

* Fixed occasional error messages on console when trying to add RTP extensions

* Update debugging section in Janus documentation.

* Optimized parsing of TWCC RTCP message (Credit to OSS-Fuzz)

* Renamed corpora file

* Avoid RTP header memory misalignment in rtx packets (meetecho#1943)

* We should allow to have ICE-TCP enabled without ICE Lite. Recent versions of libnice allow this combination and gather tcp passive candidates etc. in this setup. (meetecho#1946)

* conf: transports: document events option (meetecho#1952)

* Updated Changelog (0.9.0)

* Bumped to version 0.9.1

* Configurable global prefix for log lines (meetecho#1940)

* add missing callbacks.error check (meetecho#1959)

* janus_sip: add missing check for NULL (meetecho#1963)

Fixes meetecho#1962

* Remove Sofia reference from the title of the SIP demo

* rtp: drop dead code in rtp_header_update callers (meetecho#1964)

* Subtype for some event, and better docs for event handlers (fixes meetecho#1953) (meetecho#1957)

* Added link to new event handlers documentation to the doc main page

* Removed unused variables

* Added license badge to the README

* Small tweaks to demo intro text

* Detect H264 key frames with smaller SPS units (meetecho#1965)

Reduces the H264 keyframe length check from 16 to 6 bytes.
6 bytes seems to be the lower bound of any possibly valid SPS NAL unit,
based on Section 7.3 of the H264 specification.

For reference, we have been observing Chrome 80 producing SPS units
of 12 bytes or less.

* Support for strings as unique IDs in AudioBridge, VideoRoom, TextRoom (meetecho#1880)

* If glib is too old, generate uuid manually when needed (see meetecho#1880)

* Fixed errors creating VideoRoom when strings are used (see meetecho#1880)

* Remove duplicated codecs when answering SIP call (meetecho#1966)

* Fixed a couple of JSON attributes in VideoRoom when strings are used (see meetecho#1880)

* Make sure a publisher exists when asking for a VideoRoom subscriber renegotiation (fixes meetecho#1970)

* Added errno info when socket operations fail in Streaming plugin

* Fixed typos in TextRoom

* Support for strings as unique mountpoint IDs in Streaming plugin (meetecho#1969)

* fix meetecho#1967 (meetecho#1968)

Fixed error callback not being invoked when an HTTP error happens trying to attach to a plugin

* Added checks on nice_address_set_from_string (fixes meetecho#1973)

* Fixed broken method signature in Streaming plugin when not using libcurl

* Remove /root from the list of protected folders. Make comment text more clear.

* Valgrind fixes for sockaddr structs (meetecho#1976)

Avoid use of uninitialized members

* Hide libcurl from pkg-config when testing travis-ci with LIBCURL = NO.

* Fixed leak when creating Streaming mountpoint dynamically

* Reduced log level to info when logger and event handlers are not found (meetecho#1980)

* Always use base SSRC when recording VideoRoom simulcast participant

* Removed wrong comment

* Fixed broken DTMF in SIP demo

* Add UI to SIP demo to remove helpers, when created

* Fixed occasional missing referred-by info in SIP demo

* Reply to incoming REFER with 202 right away, not 100, in SIP plugin

* Added more checks on nice_address_set_from_string (fixes meetecho#1973) (meetecho#1981)

* Several enhancements to SIP demo

* Fixed abort at server shutdown after using SIP transfers

* Fixed typo in SIP demo code

* Updated Changelog (0.9.1)

* Bumped to version 0.9.2

* Make prebuffering in AudioBridge configurable (meetecho#1975)

* Add G.711 support to the AudioBridge plugin (meetecho#1979)

* Added maximum value for AudioBridge prebuffering property

* Converted HTTP transport plugin to single thread (meetecho#1173)

* Added -f to rm in html Makefile.am (fixes meetecho#1985)

* Small fixes for TypeScript declaration file (meetecho#1986)

Based on the current RTCConfiguration spec (https://w3c.github.io/webrtc-pc/#dom-rtcconfiguration), iceServers does not expect an array of strings.
Updating to type provided by TypeScript's lib.dom.d.ts

* ice: ensure that stream is non-NULL (meetecho#1987)

This fixes a crash on later stream checks (e.g., transport_wide_cc et al).

* Fixed typo in querylogger_parameters (copy/paste error) (meetecho#1989)

* Fixed double unlock when listing private rooms in AudioBridge (meetecho#1988)

* Make sure the session still has a reference when cleaning up HTTP requests

* Fixes to leaks and race conditions in VoiceMail plugin (meetecho#1993)

* Several fixes to session management in VideoCall plugin (meetecho#1994)

* update dtls ciphers (meetecho#1995)

* Implement ECDSA Certificate generation (meetecho#1997)

* Small tweaks to meetecho#1997 (renamed, moved and documented RSA property in janus.jcfg)

* Fix rare race condition when claiming sessions (meetecho#1990)

* Fix occasional deadlock in VideoRoom (2) (credits to @mivuDing, fixes meetecho#1982) (meetecho#1984)

* Added option to enforce validation on DTLS certificates (meetecho#1992)

Made DTLS ciphers configurable as well

* Fixed typo when renegotiating audio in janus.js (fixes meetecho#2002)

* Added option to ignore mDNS candidates (meetecho#1998)

* Fixed deadlock when using claim on HTTP transport (fixes meetecho#2000)

* Support for RTSP 'Content-Base' header in Streaming plugin (meetecho#1999)

* Added link to FOSDEM 2020 talk on RTP forwarders to the docs

* Fixed small leak in SIP plugin when holding calls

* Added called URI to 'incomingcall' and 'missed_call' events in SIP plugin

* Add repos for openSUSE and SUSE (meetecho#2009)

* Use user_id_str for kicked, leaving, and unpublished events, if enabled. (meetecho#2010)

Co-authored-by: Michael Shiel <mshiel@icehealthsystems.com>

* http_transport: add NULL checks (meetecho#2012)

Refs meetecho#2005

* Update media direction in SIP plugin if remote address is 0.0.0.0 ('hold' fix) (meetecho#2013)

* Prepare RTCP Sender Reports by considering the last RTP timestamp sent. (meetecho#2007)

* Track pending nack cleanup tasks and cancel them when freeing a stream. (meetecho#2014)

* Fixed typo in janus.js error code (fixes meetecho#2018

* Reverted change on janus.js (see meetecho#2018)

* Resolve mDNS candidates asynchronously with GResolver (see meetecho#1998) (meetecho#2004)

* Reference count janus_request instances (meetecho#2020)

Added better management of refcount on HTTP session when using it too, and refcount support to hanus_http_msg as well

* Updates to mutex unlocking in textroom and videoroom plugins (meetecho#2026)

* Updated Changelog (0.9.2)

* Bumped to version 0.9.3

* Add Python aiortc-based functional testing. (meetecho#1971)

* test_aiortc: cleanup (meetecho#2027)

* Fixed missing refcount init for Admin API (fixes meetecho#2029)

* Bumping back to 0.9.2 to re-tag

* Updated changelog for 0.9.2

* Bumped to version 0.9.3 (again)

* janus_http: return earlier if request is NULL (meetecho#2031)

* Fixed janus-pp-rec build warnings when using ffmpeg >= 4.x

* Fixed VideoRoom destroy not working when using strings

* Fixed av_register_all deprecation check in post-processor

* plugins: drop tautology (meetecho#2041)

gateway is always set before initialized, so the latter is always true.

* Don't set ICE credentials when parsing remote credentials (meetecho#2046)

* Detect libsrtp(2) using pkg-config (fixes meetecho#2019) (meetecho#2033)

* Added support for static Opus files to Streaming plugin (meetecho#2040)

* Added support for generic metadata to Streaming mountpoints

* Fixed printout of metadata in Streaming demo

* Added notes on building libsrtp (see meetecho#2024)

* Add configurable DSCP ToS for PeerConnections (meetecho#2055)

* Always add remote candidates from the libnice loop (see meetecho#2045) (meetecho#2048)

* Fixed Streaming destroy not working when using strings

* Use refcount for Streaming plugin helper threads (meetecho#2039)

* Added option to disable building AES-GCM support (see meetecho#2024 and meetecho#2054)

* Fixed typo

* Fixed outdated info in VideoRoom docs

* Fixed syntax error in sample Streaming plugin configuration file

* Support for additional constraints on screenshare media (meetecho#2043)

* refactoring-clean up (const-var, semicolons, ===, etc.) (meetecho#2044)

* Reference subscriber when handling related messages (see meetecho#2045) (meetecho#2061)

* Added option to configure time needed to detect a missing simulcast substream (meetecho#2063)

* Reverted isTrickleEnabled check in janus.js (fixes meetecho#2064)

* Don't show warnings for rtx RTCP packets

* Made libnice warning clearer, and upped suggested version (fixes meetecho#2069)

* Add missing info to videoroom "list" response (meetecho#2068)

* Use custom GSource to handle HTTP request timeouts (see meetecho#2062 and meetecho#2066) (meetecho#2075)

* Define the libnice version string as extern in version.h (fixes gcc10 error)

* Fixed AudioBridge create API not working properly when using string IDs

* Fixed a few typos in AudioBridge errors

* Fix copy-paste error in Streaming plugin docs

* Fix libasan use after free in janus_videoroom_handler when events are enabled (meetecho#2091)

* Added project to resources in the docs

* Return mountpoint IP addresses, if a bind interface/IP was provided

* Swap RR/SR Report Blocks if the first block contains rtx data. (meetecho#2089)

* Add support for playback of audio files in AudioBridge (meetecho#2088)

* Updated Changelog (0.9.3)

* Bumped to version 0.9.4

* Fixed returned address when adding multicast Streaming mountpoints

* More checks when hanging up VideoRoom subscriber (see meetecho#2087) (meetecho#2093)

* Added new docker image to the resources in the docs

* Updated AudioBridge documentation with new playback feature

* Don't wait forever for candidates when half-trickling

* Add some missing static declarations to HTTP and WS transports.

Co-authored-by: Lorenzo Miniero <lminiero@gmail.com>
Co-authored-by: Agustin Polo <poloagustin@gmail.com>
Co-authored-by: Yongje Lee <yongje.lee@hpcnt.com>
Co-authored-by: Alessandro Toppi <atoppi@meetecho.com>
Co-authored-by: Sebastian Schmid <sebastian.j.kummer@gmail.com>
Co-authored-by: Imer Husejnovic <imer90@gmail.com>
Co-authored-by: Oscar <oscar.vadillog@gmail.com>
Co-authored-by: Irek <34670509+pawnnail@users.noreply.github.com>
Co-authored-by: Tristan Matthews <tmatth@videolan.org>
Co-authored-by: Jon Rafkind <jon@rafkind.com>
Co-authored-by: kuekerino <20779891+kuekerino@users.noreply.github.com>
Co-authored-by: Yurii Cherniavskyi <yurii.cherniavskyi@gmail.com>
Co-authored-by: Meirza Arson <klanjabrik@gmail.com>
Co-authored-by: Groupboard <davidj@groupboard.com>
Co-authored-by: Cameron Lucas <clucas@clucas.info>
Co-authored-by: hxl-dy <hexulei@dyinnovations.com>
Co-authored-by: Alessandro Amirante <alex@meetecho.com>
Co-authored-by: mp16 <51138229+mp16@users.noreply.github.com>
Co-authored-by: Paul Zhang <pszhang92@gmail.com>
Co-authored-by: Philipp Hancke <fippo@goodadvice.pages.de>
Co-authored-by: Sean DuBois <sean@pion.ly>
Co-authored-by: Ancor Gonzalez Sosa <ancor@suse.de>
Co-authored-by: Michael Shiel <michaelshiel@users.noreply.github.com>
Co-authored-by: Michael Shiel <mshiel@icehealthsystems.com>
Co-authored-by: agclark81 <agclark@technolutions.com>
Co-authored-by: Alex Pavlov <alien.pavlov@gmail.com>
Co-authored-by: alexamirante <alexamirante@users.noreply.github.com>
Co-authored-by: Federico Lorenzi <florenzi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants