Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

[core] Evict cached resources and tiles equally by access time #7770

Merged
merged 1 commit into from
Jan 19, 2017

Conversation

ericrwolfe
Copy link
Contributor

Fixes #7755.

@mention-bot
Copy link

@ericrwolfe, thanks for your PR! By analyzing this pull request, we identified @jfirebaugh, @friedbunny and @mollymerp to be potential reviewers.

@friedbunny friedbunny requested a review from jfirebaugh January 18, 2017 17:48
@ericrwolfe ericrwolfe added the Core The cross-platform C++ core, aka mbgl label Jan 18, 2017
Statement accessedStmt = getStatement(
"SELECT max(accessed) "
"FROM ( "
" SELECT accessed "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the inner query is parsed as suggested by the indentation, and not "(select from resources [unordered and unlimited]) UNION ALL (select from tiles ... ordery by ... limit 50)"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -793,17 +793,42 @@ bool OfflineDatabase::evict(uint64_t neededFreeSize) {
// size, and because pages can get fragmented on the database.
while (usedSize() + neededFreeSize + pageSize > maximumCacheSize) {
// clang-format off
Statement accessedStmt = getStatement(
"SELECT max(accessed) "
Copy link
Contributor

@jfirebaugh jfirebaugh Jan 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query plan for this:

2|0|0|SCAN TABLE resources USING COVERING INDEX resources_accessed
2|1|1|SEARCH TABLE region_resources USING COVERING INDEX region_resources_resource_id (resource_id=?)
3|0|0|SCAN TABLE tiles USING COVERING INDEX tiles_accessed
3|1|1|SEARCH TABLE region_tiles USING COVERING INDEX region_tiles_tile_id (tile_id=?)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SEARCH SUBQUERY 1

This concerns me a bit: it's table scanning both resources and tiles. The scan is done on indexes that are ordered appropriately for the ORDER BY ... LIMIT clause, and the hope is that these scans will be run in parallel, and both aborted when the limit is reached. But the query plan doesn't really make clear whether that's the case or not. It would be bad if the tables both had to be scanned in full.

Can you do some benchmarking over databases of a few different sizes (empty, just ambient use, large offline region), and see how query time scales empirically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar concern, since as part of the union it seem it would need to load the entirety of each table before sorting.

The below query adds complexity but leverages the indexes within subqueries before the union:

SELECT max(accessed) 
FROM ( 
  SELECT *
  FROM (
    SELECT accessed 
    FROM resources 
    LEFT JOIN region_resources 
    ON resource_id = resources.id 
    WHERE resource_id IS NULL 
    ORDER BY accessed ASC LIMIT 50 
  )
  UNION ALL
  SELECT * 
  FROM ( 
    SELECT accessed 
    FROM tiles 
    LEFT JOIN region_tiles 
    ON tile_id = tiles.id 
    WHERE tile_id IS NULL 
    ORDER BY accessed ASC LIMIT 50 
  )
  ORDER BY accessed ASC LIMIT 50
);

And has the following query plan:

3|0|0|SCAN TABLE resources USING COVERING INDEX resources_accessed
3|1|1|SEARCH TABLE region_resources USING COVERING INDEX region_resources_resource_id (resource_id=?)
2|0|0|SCAN SUBQUERY 3
2|0|0|USE TEMP B-TREE FOR ORDER BY
5|0|0|SCAN TABLE tiles USING COVERING INDEX tiles_accessed
5|1|1|SEARCH TABLE region_tiles USING COVERING INDEX region_tiles_tile_id (tile_id=?)
4|0|0|SCAN SUBQUERY 5
4|0|0|USE TEMP B-TREE FOR ORDER BY
1|0|0|COMPOUND SUBQUERIES 2 AND 4 (UNION ALL)
0|0|0|SEARCH SUBQUERY 1

I've used sqlite's built in .timer ON to attempt to profile the queries, and the differences seem negligible for a full cache (50mb database, 1010 tiles, 9 resources) but no offline regions stored.
Query in commit: Run Time: real 0.000 user 0.000155 sys 0.000076
Query in comment: Run Time: real 0.000 user 0.000207 sys 0.000120

I'm not sure of a better way to profile sqlite here, and not sure if those query results are cached. Let me know if you have any ideas, couldn't find much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems promising. It's pretty important that we don't regress eviction performance for larger databases too, so can you test both approaches with a large offline region as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing on a database with SF downloaded for offline: 153mb, 3300 tiles, 1284 resources

Query in commit, run 3x:

Run Time: real 0.014 user 0.002650 sys 0.002637
Run Time: real 0.002 user 0.001782 sys 0.000058
Run Time: real 0.002 user 0.001967 sys 0.000073

Query in comment:

Run Time: real 0.028 user 0.003179 sys 0.003182
Run Time: real 0.002 user 0.001882 sys 0.000116
Run Time: real 0.002 user 0.001802 sys 0.000082

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Let's go with the simpler query then.

@ericrwolfe ericrwolfe merged commit a2ceeb4 into master Jan 19, 2017
@1ec5
Copy link
Contributor

1ec5 commented Jan 19, 2017

@ericrwolfe, can you update the Android, iOS, and macOS changelogs to reflect this change?

Edit: Happening in #7731.

1ec5 added a commit that referenced this pull request Jan 19, 2017

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Updated changelogs to mention #7446, #7356, #7465, #7616, #7445, #7444, #7526, #7586, #7574, and #7770. Also corrected the blurb about #7711.
@jfirebaugh jfirebaugh deleted the erw-cache-eviction branch May 11, 2017 19:50
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Core The cross-platform C++ core, aka mbgl offline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants