Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock ordering issue in video conference bridge #3183

Merged
merged 18 commits into from
Jul 21, 2022
Merged

Conversation

nanangizz
Copy link
Member

Sample scenario

  • Video conference (VC) worker thread: put/get_frame() publishes media event (synchronously), currently put/get_frame() is called while holding the VC mutex and app receiving the event invokes a SIP dialog API which tries to acquire the SIP dialog mutex.
  • Other thread, app receives a SIP dialog callback (note: such callback is usually invoked while holding dialog mutex), it invokes a VC API which tries to acquire VC mutex.

The scenario involves non-uniform lock ordering between VC and SIP dialog mutexes which can cause deadlock.

Proposed solution

The proposed solution is to avoid holding VC mutex while calling put/get_frame() in VC worker thread. To allow this, a couple major modifications are implemented in this PR:

  1. VC internal states (such as port connection, rendering states) must not be changed when put/get_frame() is called, so some VC API operations (such as connect/disconnect ports, remove port, update port) must be queued or done asynchronously, the real operation will be performed in the VC worker thread context.
  2. As remove port operation will be done asynchronously, media ports (especially video ports for now) need to have a reference counter (or group lock) to avoid premature destroy.

Other fixes

This PR also contains a few minor other fixes such as:

  • in pj_grp_lock_create_w_handler(): memory pool used for adding handler, updated to using group lock's memory pool itself, previously using app pool, was getting a premature destroy issue because of this (i.e: VC port added using PJSUA vidwin pool, the vidwin may get destroyed before the VC port, group lock crashed when iterating handlers).
  • in PJMEDIA stream, somehow transport is accessed before the attach happens and it is still NULL, or it is accessed after set to NULL (in destructor), so put NULL transport check in some RTP/RTCP sending.
  • in VC, fill buffer with black is performed when buffer is initially allocated and sink buffer after sink format changed, was every time render state is updated.
  • add/modify logs in a few places including in app.

Copy link
Member

@sauwming sauwming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you merge with the latest master so I can do some local testing?

pjmedia/src/pjmedia/port.c Show resolved Hide resolved
pjmedia/include/pjmedia/port.h Outdated Show resolved Hide resolved
@nanangizz
Copy link
Member Author

Also, can you merge with the latest master so I can do some local testing?

Done, hopefully I didn't do it wrong, the traces above look a bit strange :)

@nanangizz nanangizz merged commit 6ff18b4 into master Jul 21, 2022
@nanangizz nanangizz deleted the glock-for-port branch July 21, 2022 08:10
wosrediinanatour pushed a commit to wosrediinanatour/pjproject that referenced this pull request Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants