Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve perf and scalability of Regex's cache #542

Merged
merged 1 commit into from
Dec 6, 2019

Conversation

stephentoub
Copy link
Member

Regex maintains a cache used for the static methods on Regex, e.g. Regex.IsMatch. The cache is implemented as an LRU cache, which maintains a linked list and a dictionary of the cached instances. The linked list maintains the order in which the cached instances were last accessed, making it cheap to expunge older items from the cache. However, that comes at a significant cost: unless the item is the very first one in the linked list, all reads on the cache require taking a global lock, because the linked list needs to be mutated to move the found node to the beginning. That lock has both throughput and scalability implications.

This PR changes the cache from using a Dictionary<> and a linked list to instead using a ConcurrentDictionary<> and a List<>. Rather than making all accesses more expensive in order to make drops less expensive, it makes all reads much cheaper and more scalable, at the expense of making drops more expensive. Since dropping from the cache means we're already paying the expensive cost of creating/parsing/compiling/etc. a new Regex instance, this is a better trade-off, especially since any frequent dropping suggests the consuming app or library needs to revisit its Regex strategy, either using Regex.CacheSize to increase the cache size appropriately, or doing its own caching (e.g. creating the Regex instance it needs and storing it into a field for all future use).

The new scheme uses a ConcurrentDictionary<Key,Node>, a List<Node>, and a fast-path field storing the most recently used Regex instance (just as the existing implementation did). On lookups, if the fast-path field has the matching value, it's just returned. Otherwise, the dictionary is consulted, and if the item is found, the fast-path field is updated. No locking at all is employed, and only a few volatile read/writes are used to update a "last access stamp" that's used to indicate importance if/when items do need to be expunged. On additions, we do still take a global lock and add to the cache. If this puts us over our cache size, we pick an item from the list and remove it. If the list is small, we just examine all of the items looking for the oldest. If the list is larger, we examine a random subset of it; we may not get rid of the absolute oldest item, but it'll be old enough.

cc: @danmosemsft, @eerhardt, @ViktorHofer

Results from running master (old) vs this PR (new) on the RegexCache* tests from the dotnet/performance repo:

Method Toolchain total unique cacheSize Mean Ratio Gen 0
IsMatch new 40000 7 0 55.792 ms 0.96 24750.0000
IsMatch old 40000 7 0 57.454 ms 1.00 25000.0000
IsMatch_Multithreading new 40000 7 0 31.867 ms 0.98 25000.0000
IsMatch_Multithreading old 40000 7 0 33.059 ms 1.00 25000.0000
IsMatch new 40000 1600 15 112.867 ms 0.64 38000.0000
IsMatch old 40000 1600 15 176.183 ms 1.00 39000.0000
IsMatch_Multithreading new 40000 1600 15 54.960 ms 0.50 38000.0000
IsMatch_Multithreading old 40000 1600 15 109.076 ms 1.00 39000.0000
IsMatch new 40000 1600 800 87.606 ms 0.51 20000.0000
IsMatch old 40000 1600 800 174.088 ms 1.00 22000.0000
IsMatch_Multithreading new 40000 1600 800 50.726 ms 0.41 20000.0000
IsMatch_Multithreading old 40000 1600 800 123.640 ms 1.00 22000.0000
IsMatch new 40000 1600 3200 13.444 ms 0.94 -
IsMatch old 40000 1600 3200 14.247 ms 1.00 -
IsMatch_Multithreading new 40000 1600 3200 5.500 ms 0.42 -
IsMatch_Multithreading old 40000 1600 3200 13.180 ms 1.00 -
IsMatch new 400000 1 15 41.607 ms 1.00 -
IsMatch old 400000 1 15 41.512 ms 1.00 -
IsMatch_Multithreading new 400000 1 15 40.066 ms 0.90 18000.0000
IsMatch_Multithreading old 400000 1 15 44.558 ms 1.00 33500.0000
IsMatch new 400000 7 15 66.953 ms 0.93 -
IsMatch old 400000 7 15 71.789 ms 1.00 -
IsMatch_Multithreading new 400000 7 15 46.878 ms 0.52 12000.0000
IsMatch_Multithreading old 400000 7 15 90.335 ms 1.00 9000.0000

Regex maintains a cache used for the static methods on Regex, e.g. Regex.IsMatch.  The cache is implemented as an LRU cache, which maintains a linked list and a dictionary of the cached instances.  The linked list maintains the order in which the cached instances were last accessed, making it cheap to expunge older items from the cache.  However, that comes at a significant cost: unless the item is the very first one in the linked list, all reads on the cache require taking a global lock, because the linked list needs to be mutated to move the found node to the beginning.  That lock has both throughput and scalability implications.

This PR changes the cache from using a `Dictionary<>` and a linked list to instead using a `ConcurrentDictionary<>` and a `List<>`.  Rather than making all accesses more expensive in order to make drops less expensive, it makes all reads much cheaper and more scalable, at the expense of making drops more expensive.  Since dropping from the cache means we're already paying the expensive cost of creating/parsing/compiling/etc. a new Regex instance, this is a better trade-off, especially since any frequent dropping suggests the consuming app or library needs to revisit its Regex strategy, either using Regex.CacheSize to increase the cache size appropriately, or doing its own caching (e.g. creating the Regex instance it needs and storing it into a field for all future use).

The new scheme uses a `ConcurrentDictionary<Key,Node>`, a `List<Node>`, and a fast-path field storing the most recently used Regex instance (just as the existing implementation did).  On lookups, if the fast-path field has the matching value, it's just returned.  Otherwise, the dictionary is consulted, and if the item is found, the fast-path field is updated.  No locking at all is employed, and only a few volatile read/writes are used to update a "last access stamp" that's used to indicate importance if/when items do need to be expunged.  On additions, we do still take a global lock and add to the cache.  If this puts us over our cache size, we pick an item from the list and remove it.  If the list is small, we just examine all of the items looking for the oldest.  If the list is larger, we examine a random subset of it; we may not get rid of the absolute oldest item, but it'll be old enough.
@stephentoub
Copy link
Member Author

The CI failures here are strange; lots of EventSource tests failing with, e.g.

    BasicEventSourceTests.TestsWrite.Test_Write_T_ETW [FAIL]
      Assert.Equal() Failure
                � (pos 0)
      Expected: 
      Actual:   System.Collections.Concurrent.ConcurrentCúúú
                � (pos 0)
      Stack Trace:
        /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestUtilities.cs(50,0): at BasicEventSourceTests.TestUtilities.CheckNoEventSourcesRunning(String message)
        /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.cs(456,0): at BasicEventSourceTests.TestsWrite.Test_Write_T(Listener listener)
        /_/src/libraries/System.Diagnostics.Tracing/tests/BasicEventSourceTest/TestsWrite.Etw.cs(29,0): at BasicEventSourceTests.TestsWrite.Test_Write_T_ETW()

My assumption is that a) there was some kind of change in coreclr recently that is now causing a discrepancy with the tests, and b) this is now showing up after @safern's live/live change went in last night, but I'm not sure why I don't see similar failures on other PRs, nor why the "Actual" string above looks corrupted ("System.Collections.Concurrent.ConcurrentCúúú"). Regardless, I put up #565 to add this EventSource to the test's exempted list. @noahfalk, ideas?

@safern
Copy link
Member

safern commented Dec 5, 2019

but I'm not sure why I don't see similar failures on other PRs.

Does this repro locally with and without your change?

@stephentoub stephentoub closed this Dec 5, 2019
@stephentoub stephentoub reopened this Dec 5, 2019
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Just some clarifying questions to help me understand.

@stephentoub
Copy link
Member Author

stephentoub commented Dec 6, 2019

Does this repro locally with and without your change?

No. And CI passed now.

@stephentoub stephentoub merged commit d49fc9e into dotnet:master Dec 6, 2019
@stephentoub stephentoub deleted the alternateregexcaching branch December 6, 2019 14:48
@noahfalk
Copy link
Member

noahfalk commented Jan 3, 2020

@stephentoub - sorry for a very late reply, I've been on vacation all December and GitHub doesn't have a nice out-of-office feature. I could imagine that your usage of ConcurrentDictionary caused the ConcurrentCollectionsEventSource to get lazily created in a bunch of tests that previously never initialized it, which in turn caused it to get flagged by the test code which is asserting that no unexpected EventSources had been created. Adding it to the exclusion list of known BCL EventSources was the right move. As for why the string was showing up corrupted, that I can't explain. I think its much more likely that it is some issue relating to xunit or the console display given that the string comparison you added at line 39 would only work if eventSource.Name contained the expected string data at that point.

@stephentoub
Copy link
Member Author

Thanks, Noah.

@danmoseley
Copy link
Member

Maybe úúú is meant to be an ellipsis with some special period, and the console codepage is corrupting it.

@stephentoub stephentoub mentioned this pull request Jan 7, 2020
41 tasks
@stephentoub stephentoub added the tenet-performance Performance related issue label Jan 12, 2020
@stephentoub stephentoub added this to the 5.0 milestone Jan 12, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants