Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Deadlock in ygot/goyang patches #266

Closed
slicking opened this issue Jun 25, 2024 · 0 comments
Closed

Potential Deadlock in ygot/goyang patches #266

slicking opened this issue Jun 25, 2024 · 0 comments
Assignees

Comments

@slicking
Copy link

The sonic-mgmt-common repo maintains patches to the ygot and goyang libraries, among other things these patches add performance enhancements which introduce a mutex protected cache/map here to the entry struct. See ChildSchema and ChildSchemaMutex.

However, the ygot library will perform a copy on an Entry struct during an Unmarshal operation. If this copy is performed while a go routine has locked the ChildSchemaMutex the copy of the Entry will have the mutex in a locked state and no go routine will ever unlock it. If the go routine performing the Unmarshal attempts to access the cache on the copied Entry it will then wait forever.

Our patches should be updated to account for this behavior of the ygot library and remove the potential for deadlocking a go routine.

A scenario where this has been seen to happen is the following:
Go routine A is created by the gnmi server to handle a subscription:

switch mode {
case gnmipb.SubscriptionList_STREAM:
c.stop = make(chan struct{}, 1)
c.w.Add(1)
go dc.StreamRun(c.q, c.stop, &c.w, c.subscribe)
case gnmipb.SubscriptionList_POLL:
c.polled = make(chan struct{}, 1)
c.polled <- struct{}{}
c.w.Add(1)
go dc.PollRun(c.q, c.polled, &c.w, c.subscribe)
case gnmipb.SubscriptionList_ONCE:
c.once = make(chan struct{}, 1)
c.once <- struct{}{}
c.w.Add(1)
go dc.OnceRun(c.q, c.once, &c.w, c.subscribe)

Go routine A will enter the translib code and call:
commonApp.processGet() --> GetAndXlateFromDB() --> dbDataToYangJsonCreate() --> yangDataFill()
The yangDataFill() function will eventually enter the ygot library calling ytypes.Unmarshal and perform the Entry struct copy here:
https://github.com/openconfig/ygot/blob/v0.7.1/ytypes/list.go#L527

Go routine B is created by the gnmi server to handle another subscription:

err = c.Run(stream)

Go routine B will create a new Data Client with NewTranslClient()
The NewTranslClient() will then call transutil.PopulateClientPaths
PopulateClientPaths then calls ConvertToURI() which creates a new PathValidator() and calls Validate.
The Validate function will eventually call into validateListKeyValues() which enters the ygot library with a call to ytypes.GetOrCreateNode
This eventually results in a call to retrieveNodeContainer which gets the schema for a child which then uses the ChildSchemaMutex and ChildSchema cache added by our patch.

slicking added a commit to slicking/sonic-mgmt-common that referenced this issue Jun 26, 2024
Prevent deadlocking on the ChildSchemaMutex by ensuring no other go routine
is holding the lock while the Entry struct is copied.
slicking added a commit to slicking/sonic-mgmt-common that referenced this issue Jul 29, 2024
Our patches to the goyang/ygot library add performance optimizations by
including a mutex protected cache (map) in the Entry struct.  The ygot
library makes copies of Entry objects during unmarshal operations which
is not safe given our changes.  This change moves the cache and mutex to
a separate struct and adds a pointer to the new struct into the Entry
struct.  Now copies of Entry structs will result in both copies sharing
the same cache and avoid issues around copying mutexes and maps.
slicking added a commit to slicking/sonic-mgmt-common that referenced this issue Jul 29, 2024
Our patches to the goyang/ygot library add performance optimizations by
including a mutex protected cache (map) in the Entry struct.  The ygot
library makes copies of Entry objects during unmarshal operations which
is not safe given our changes.  This change moves the cache and mutex to
a separate struct and adds a pointer to the new struct into the Entry
struct.  Now copies of Entry structs will result in both copies sharing
the same cache and avoid issues around copying mutexes and maps.
mssonicbld added a commit to sonic-net/sonic-buildimage that referenced this issue Aug 7, 2024
…omatically (#19845)

#### Why I did it
src/sonic-mgmt-common
```
* 966adc0 - (HEAD -> master, origin/master, origin/HEAD) Fix sonic-net/sonic-gnmi#266 (#143) (4 hours ago) [Steve Licking]
```
#### How I did it
#### How to verify it
#### Description for the changelog
matiAlfaro pushed a commit to Marvell-switching/sonic-buildimage that referenced this issue Aug 21, 2024
…omatically (sonic-net#19845)

#### Why I did it
src/sonic-mgmt-common
```
* 966adc0 - (HEAD -> master, origin/master, origin/HEAD) Fix sonic-net/sonic-gnmi#266 (sonic-net#143) (4 hours ago) [Steve Licking]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants