Skip to content
This repository has been archived by the owner on Jun 30, 2022. It is now read-only.

Fix the deadlock between TimerExpireCheck() and PfcpXactDelete() #42

Merged

Conversation

shugo-h
Copy link
Contributor

@shugo-h shugo-h commented Nov 4, 2021

  • Move TimerBlk::expireFunc call out of the mutex critical section.
  • Add a expiration check of PfcpXact::timerHolding in PfcpXactTimeout().

- Move TimerBlk::expireFunc call out of the  mutex critical section.
- Add a expiration check of PfcpXact::timerHolding in PfcpXactTimeout().
@@ -664,7 +664,7 @@ Status PfcpXactTimeout(uint32_t index, uint32_t event, uint8_t *type) {
PfcpXactDelete(xact);
return STATUS_ERROR;
}
} else if (event == globalHoldingEvent) {
} else if (event == globalHoldingEvent && TimerIsExpired(xact->timerHolding)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PfcpXactTimeout() is used to handle any timeout event, that is sent by UPF when the timer expired.
So, I think maybe you don't need to check this timer is expired or not again?
Tks.

Copy link
Contributor Author

@shugo-h shugo-h Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for commenting.

I think better that timer expiration should be checked again here.

The xact->timerHolding during processing PfcpXactTimeout() is possible to have been updated by just previous UPF_EVENT_N4_MESSAGE event, for instance, receiving a retransmitted PFCP Session Establishment Request from SMF.
Such xact shall be deleted in the next UPF_EVENT_N4_T3_HOLDING event processing even though its timerHolding has already been updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shugo-h
Thanks for your explanation.
Do you have any suggestions on how to reappear the deadlock mentioned in issue 268?
now I have several ideas but I‘m hard to pick the better one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?;

  1. Make many PDU sessions (e.g. 30 sessions or more) established at almost same moment. This will cause almost simultaneous PfcpXact timer expirations.
  2. Aim at the expiration time of the PfcpXact timer started in step 1, and make all PDU sessions released.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shugo-h
That looks great!
I would try the method you provided for validation.

Thanks a lot.

Copy link
Contributor Author

@shugo-h shugo-h Nov 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 2 may not be necessary.
Step 1 will cause many UPF_EVENT_N4_T3_HOLDING events almost simultaneously with holding mutex lock.
I think a PfcpXactTimeout() raised by one of those events may fall into the deadlock in PfcpXactDelete().

But it's better also to do step 2 in order to make event queue full.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants