-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential Deadlock in ygot/goyang patches #266
Comments
slicking
added a commit
to slicking/sonic-mgmt-common
that referenced
this issue
Jun 26, 2024
Prevent deadlocking on the ChildSchemaMutex by ensuring no other go routine is holding the lock while the Entry struct is copied.
slicking
added a commit
to slicking/sonic-mgmt-common
that referenced
this issue
Jul 29, 2024
Our patches to the goyang/ygot library add performance optimizations by including a mutex protected cache (map) in the Entry struct. The ygot library makes copies of Entry objects during unmarshal operations which is not safe given our changes. This change moves the cache and mutex to a separate struct and adds a pointer to the new struct into the Entry struct. Now copies of Entry structs will result in both copies sharing the same cache and avoid issues around copying mutexes and maps.
slicking
added a commit
to slicking/sonic-mgmt-common
that referenced
this issue
Jul 29, 2024
Our patches to the goyang/ygot library add performance optimizations by including a mutex protected cache (map) in the Entry struct. The ygot library makes copies of Entry objects during unmarshal operations which is not safe given our changes. This change moves the cache and mutex to a separate struct and adds a pointer to the new struct into the Entry struct. Now copies of Entry structs will result in both copies sharing the same cache and avoid issues around copying mutexes and maps.
mssonicbld
added a commit
to sonic-net/sonic-buildimage
that referenced
this issue
Aug 7, 2024
…omatically (#19845) #### Why I did it src/sonic-mgmt-common ``` * 966adc0 - (HEAD -> master, origin/master, origin/HEAD) Fix sonic-net/sonic-gnmi#266 (#143) (4 hours ago) [Steve Licking] ``` #### How I did it #### How to verify it #### Description for the changelog
matiAlfaro
pushed a commit
to Marvell-switching/sonic-buildimage
that referenced
this issue
Aug 21, 2024
…omatically (sonic-net#19845) #### Why I did it src/sonic-mgmt-common ``` * 966adc0 - (HEAD -> master, origin/master, origin/HEAD) Fix sonic-net/sonic-gnmi#266 (sonic-net#143) (4 hours ago) [Steve Licking] ``` #### How I did it #### How to verify it #### Description for the changelog
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The sonic-mgmt-common repo maintains patches to the ygot and goyang libraries, among other things these patches add performance enhancements which introduce a mutex protected cache/map here to the entry struct. See
ChildSchema
andChildSchemaMutex
.However, the ygot library will perform a copy on an
Entry
struct during anUnmarshal
operation. If this copy is performed while a go routine has locked theChildSchemaMutex
the copy of theEntry
will have the mutex in a locked state and no go routine will ever unlock it. If the go routine performing theUnmarshal
attempts to access the cache on the copiedEntry
it will then wait forever.Our patches should be updated to account for this behavior of the ygot library and remove the potential for deadlocking a go routine.
A scenario where this has been seen to happen is the following:
Go routine A is created by the gnmi server to handle a subscription:
sonic-gnmi/gnmi_server/client_subscribe.go
Lines 185 to 199 in a610784
Go routine A will enter the translib code and call:
commonApp.processGet() --> GetAndXlateFromDB() --> dbDataToYangJsonCreate() --> yangDataFill()
The yangDataFill() function will eventually enter the ygot library calling ytypes.Unmarshal and perform the Entry struct copy here:
https://github.com/openconfig/ygot/blob/v0.7.1/ytypes/list.go#L527
Go routine B is created by the gnmi server to handle another subscription:
sonic-gnmi/gnmi_server/server.go
Line 290 in a610784
Go routine B will create a new Data Client with NewTranslClient()
The NewTranslClient() will then call transutil.PopulateClientPaths
PopulateClientPaths then calls ConvertToURI() which creates a new PathValidator() and calls Validate.
The Validate function will eventually call into validateListKeyValues() which enters the ygot library with a call to ytypes.GetOrCreateNode
This eventually results in a call to retrieveNodeContainer which gets the schema for a child which then uses the
ChildSchemaMutex
andChildSchema
cache added by our patch.The text was updated successfully, but these errors were encountered: