Shared storage cache #4989

ghoshkaj · 2018-04-01T23:34:24Z

Issue

Use a shared storage cache to compare against performance of the threadlocal cache in #4876. This PR is to test out which is faster and keep the implementation that is faster. Related to cache considerations detailed here.

Tasklist

Requirements / Relations

#4876

ghoshkaj · 2018-04-02T05:42:52Z

@oxidase I didn't make the cache a singleton since all the accesses to the cache are through the SearchEngine. Instead I got rid of the boost::thread_specific_ptr and used a std::unique_ptr instead to try and simulate a singleton. However, I'm not sure that it is threadsafe for over here when we create a new cache for a new dataset (timestamp).

ghoshkaj · 2018-04-02T20:07:21Z

After talking with @danpat, I converted the cache pointer into a cache object: f39e37a. But tests were still failing non-determiistically.

Then I talked to @miccolis and went with putting a lock in front of the call to clear the function from within the SearchEngine: 01e6562. The tests have stopped failing, but this probably makes it slower.

Some conversation came up around whether to make this shared storage cache be inter-thread shared storage versus inter-process shared storage. This implementation is of inter-thread shared storage as of now. Decisions to be made after performance measurements between threadlocal vs inter-thread shared storage. Hopefully we don't have to venture into inter-process shared storage 😬 🤞

TheMarex · 2018-04-03T15:12:09Z

One consideration for using a shared storage is that there can be multiple datasets (usually two) in flight at the same time. This is not a problem for thread-local caching, since a single thread will either only use the new or the old data. For a shared cache we need to keep track of the dataset used to compute every entry in the cache, that means we need to extend the cache line to include the timestamp.

The full timestamp used by OSRM is a 32bit integer but for the cache it is really only important to be able to differentiate between the datasets still in use. Typically we limit the number of datasets to 2 so in theory 1 bit of information would be enough. For the sake of generality I would suggest we use 1 byte (== 255 possible datasets) to start and optimize this later.

The cache design would need to be changed in the following way:

Each cache entry gets an additional value generation of one unsigned char (== 1 byte)
At the beginning of every query we obtain the generation for a given timestamp from the cache.
When inserting a value into the cache the user needs to pass in the generation with the value.
When retrieving a value from the cache the user needs to pass in the generation as well.
The generation would be used to compute the hash for the key in unordered_map.

ghoshkaj · 2018-04-04T10:39:44Z

@TheMarex the steps above are really clever and adding unsigned char generation as part of the hash would have been so cool, except that with the vendored version of the LRU cache that I'm using, I don't have access to the hashing function anymore, so I can' t add the generation to be part of the unique key signature. Do you have a suggestion for a way around this problem? For now, I've just added the full timestamp as part of the key in the line, but that take up an additional 8 bytes to the cache.

TheMarex · 2018-04-04T11:05:39Z

LRU cache that I'm using, I don't have access to the hashing function anymore

You can do it the same way you currently handle the full timestamp value by making it part of the key. The generation logic can live in the interface that you build around the LRU cache. Seems like boost automatically generates a hash function for a std::tuple, so extending the actual has function is obsolete.

ghoshkaj · 2018-04-04T11:05:47Z

~~@TheMarex Ooo, actually I could just hash it myself and pass the hash to the library as the key!! Going to do that!~~ posted at the same time. Going to move forward with your suggestion.

ghoshkaj · 2018-04-09T03:36:18Z

branch-gitsha \ dataset, matrix-size, numReqs	usca-25x25-10k reqs (ms)	usca-25x25-10k reqs (ms)	usca-25x25-10k reqs (ms)	usca-25x25-10k reqs (ms)	usca-100x100-5k reqs (ms)	usca-100x100-5k reqs (ms)	usca-100x100-5k reqs (ms)	usca-100x100-5k reqs (ms)	berlin-latest -5x5 - 20k reqs (ms)	berlin-latest -5x5 - 20k reqs (ms)	berlin-latest -5x5 - 20k reqs (ms)
	2 threads	4 threads	8 threads	16 threads	2 threads	4 threads	8 threads	16 threads	2 threads	4 threads	8 threads	16 threads
master	20.556	20.807	20.750	20.755	97.250	97.154	97.017	96.763
threadlocal-cache (250 mb)
shared-storage (250mb)
threadlocal-cache (500 mb)	31.592	31.833	32.497	33.065	226.447	226.923	431.501	225.378
shared-storage (500 mb)
threadlocal-cache (1000 mb)
shared-storage (1000 mb)

Running osrm-runner with this command, adjusting lengh of request and number of requests as appropriate:

./scripts/osrm-runner.js -n 10000 -b -122.65,37.26,-119.25,39.83  -p '/table/v1/driving/{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{};{}' -f '$.durations[0]'

For threads, using the - t parameter on ./osrm-routed

copy dummy cache over implement retrievePackedPathFromSearchSpace calculate packed_path_from_source_to_middle debugging the retrievePackedPathFromSearchSpace function implementation adding in packed_path_from_source_to_middle cache is partway working unpack path and get duration that way the computeDurationForEdge method comment out cache clean up the code move vector creation and allocation to outside of loop hack to not return vectors on facade.GetUncompressedForwardDurations and facade.GetUncompressedReverseDurations clean up hack add exclude_index to cache key clearing cache with timestamp rebase against vectors->range pr swapped out unordered_map cache with a boost_lru implementation calculation for cache size cleaned up comment about cache size calculations unit tests cache uses unsigned char for exclude index clean up cache and unit tests

500 mb threadlocal 2 t

copy dummy cache over implement retrievePackedPathFromSearchSpace calculate packed_path_from_source_to_middle debugging the retrievePackedPathFromSearchSpace function implementation adding in packed_path_from_source_to_middle cache is partway working unpack path and get duration that way the computeDurationForEdge method comment out cache clean up the code move vector creation and allocation to outside of loop hack to not return vectors on facade.GetUncompressedForwardDurations and facade.GetUncompressedReverseDurations clean up hack add exclude_index to cache key clearing cache with timestamp rebase against vectors->range pr swapped out unordered_map cache with a boost_lru implementation calculation for cache size cleaned up comment about cache size calculations unit tests cache uses unsigned char for exclude index clean up cache and unit tests

shared lock for reads and unique lock for writes declare cache as object and not pointer in serach engine data to simulate singleton declaration put a lock infront of the clear function to make it threadsafe remove clear function from cache because cache will never get dropped unit tests unit tests and timestamp as part of key cache generations hash the key 500 mb 1000 mb 250 mb rebase against implement-cache

ghoshkaj requested a review from oxidase April 2, 2018 04:32

ghoshkaj self-assigned this Apr 2, 2018

ghoshkaj added the Work In Progress label Apr 2, 2018

ghoshkaj force-pushed the shared-storage-cache branch from 65e3f74 to 01e6562 Compare April 2, 2018 23:51

ghoshkaj changed the base branch from master to compute-annotations-for-table-at-runtime April 3, 2018 00:52

ghoshkaj changed the base branch from compute-annotations-for-table-at-runtime to master April 3, 2018 00:59

ghoshkaj force-pushed the shared-storage-cache branch 2 times, most recently from 7e28a9d to 71892a5 Compare April 5, 2018 12:40

ghoshkaj changed the base branch from master to implement-cache April 5, 2018 17:47

ghoshkaj force-pushed the shared-storage-cache branch from 923e6c0 to eda5aa4 Compare April 6, 2018 14:37

ghoshkaj force-pushed the implement-cache branch 5 times, most recently from 3bfdaf3 to 729a399 Compare April 8, 2018 23:40

ghoshkaj force-pushed the shared-storage-cache branch 2 times, most recently from fac25d1 to 97cd4c7 Compare April 9, 2018 03:30

ghoshkaj force-pushed the implement-cache branch 2 times, most recently from c7983bd to 0470a4e Compare April 9, 2018 13:15

ghoshkaj force-pushed the shared-storage-cache branch from 97cd4c7 to 951daed Compare April 9, 2018 13:16

ghoshkaj added 3 commits April 20, 2018 08:00

pass in a hashed key to the threadlocal cache

0fb7068

500 mb threadlocal 2 t

fixes and a rebase

e4a4a8d

ghoshkaj force-pushed the implement-cache branch from 0470a4e to e4a4a8d Compare April 20, 2018 12:51

ghoshkaj added 11 commits April 20, 2018 08:52

500mb-2threads

b2fbc95

500mb-4threads

2829603

500mb-8threads

3e117af

500mb-16threads

0ba69e3

1gb-16threads

0901dd8

1gb-8threads

3220bd3

1gb-4threads

f0fcdc2

1gb-2threads

e59fdac

ghoshkaj force-pushed the shared-storage-cache branch from 951daed to c5acd6e Compare April 20, 2018 15:15

ghoshkaj added 3 commits April 20, 2018 11:20

250mb

2aed130

500mb

8fa55a1

1gb

60269da

ghoshkaj force-pushed the implement-cache branch from e59fdac to 4e2f6b7 Compare April 20, 2018 15:26

ghoshkaj force-pushed the implement-cache branch 3 times, most recently from 26122e3 to 62d7d08 Compare May 7, 2018 15:36

TheMarex mentioned this pull request May 14, 2018

Return distance for table queries #1353

Closed

ghoshkaj closed this Apr 18, 2020

DennisOSRM deleted the shared-storage-cache branch November 6, 2022 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared storage cache #4989

Shared storage cache #4989

ghoshkaj commented Apr 1, 2018 •

edited

Loading

ghoshkaj commented Apr 2, 2018

ghoshkaj commented Apr 2, 2018

TheMarex commented Apr 3, 2018

ghoshkaj commented Apr 4, 2018

TheMarex commented Apr 4, 2018

ghoshkaj commented Apr 4, 2018 •

edited

Loading

ghoshkaj commented Apr 9, 2018

Shared storage cache #4989

Shared storage cache #4989

Conversation

ghoshkaj commented Apr 1, 2018 • edited Loading

Issue

Tasklist

Requirements / Relations

ghoshkaj commented Apr 2, 2018

ghoshkaj commented Apr 2, 2018

TheMarex commented Apr 3, 2018

ghoshkaj commented Apr 4, 2018

TheMarex commented Apr 4, 2018

ghoshkaj commented Apr 4, 2018 • edited Loading

ghoshkaj commented Apr 9, 2018

ghoshkaj commented Apr 1, 2018 •

edited

Loading

ghoshkaj commented Apr 4, 2018 •

edited

Loading