Allow to prewarm the cache for searchable snapshot shards #55322

tlrx · 2020-04-16T15:57:42Z

This pull requests adds a way to prewarm the cache for searchable snapshot shard files.

It relies on a new index setting named index.store.snapshot.cache.load.eagerly (defaults to false) that can be passed when mounting a snapshot as an index. This setting is detected during the pre-recovery step before the snapshot files are exposed to the other components of the system. The method prewarmCache() of the SearchableSnapshotDirectory instance is executed, which builds the list of all parts of snapshot files that needs to be prefetched in cache (excluding the files that are stored in metadata hash and the ones explicitly excluded by the excluded_file_types setting).

Then parts are prefetched in cache in parallel using the SNAPSHOT thread pool. If a snapshot file is composed of multiple parts (or chunks) then the parts can potentially be downloaded and written in cache concurrently. The implementation relies on a new prefetchPart() method added to the CachedBlobContainerIndexInput class. This method allows to fetch a complete part of a file (or the whole file if the snapshot file is composed of a single part) in order to write it in cache. This is possible because CacheFile has been modified to work with configurable cache range sizes depending on the IOContext the IndexInput has been opened with.

When the IndexInput is opened using the specific CACHE_WARMING_CONTEXT context then the file is cached on disk using large ranges of bytes aligned on the beginning and the end of each part (or chunk) of the file. When using a different context then the fill is cached on disk using the normal cache range size defined through the range_size setting. This implementation allows to reuse the existing cache eviction mechanism if something goes wrong when reading or writing the part. It also simplifies the logic if the recovering shard is closed while prewarming the cache.

elasticmachine · 2020-04-16T15:57:44Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

tlrx · 2020-04-16T15:58:24Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

@@ -115,8 +135,10 @@ public SearchableSnapshotDirectory(
        this.cacheDir = Objects.requireNonNull(cacheDir);
        this.closed = new AtomicBoolean(false);
        this.useCache = SNAPSHOT_CACHE_ENABLED_SETTING.get(indexSettings);
+        this.loadCacheEagerly = useCache ? SNAPSHOT_CACHE_LOAD_EAGERLY_SETTING.get(indexSettings) : false;


Better name suggestions are welcome

tlrx · 2020-04-16T15:59:47Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

        boolean alreadyLoaded = this.loaded;
        if (alreadyLoaded == false) {
            synchronized (this) {
                alreadyLoaded = this.loaded;
                if (alreadyLoaded == false) {
                    this.blobContainer = blobContainerSupplier.get();
                    this.snapshot = snapshotSupplier.get();
+                    if (loadCacheEagerly) {
+                        prewarmCache();


This methods blocks until the cache is fully prewarmed. It must be done before loaded is set to true so that other components of the system are not likely to trigger some caching on this directory files.

Is that strictly necessary? I would prefer to initiate the prewarming here, but at the same time allow the shard routing to move to started state as quickly as possible.

This is not strictly necessary but a misunderstanding from my part. We discussed this and I updated the PR so that cache warming now runs concurrently with the recovery.

tlrx · 2020-04-16T16:00:32Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java


-        final BlobStoreIndexShardSnapshot.FileInfo fileInfo = fileInfo(name);
+    private IndexInput openInput(final BlobStoreIndexShardSnapshot.FileInfo fileInfo, final IOContext context) {


Splitting this method into two allows to open an IndexInput even if the snapshot is not marked as loaded yet

I think this is no longer necessary? The private openInput method is only called in one place.

Indeed - I pushed c8a1c6b to remove the method.

tlrx · 2020-04-16T16:02:51Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

+                    );
+                    final long startTimeInNanos = statsCurrentTimeNanosSupplier.getAsLong();
+                    try {
+                        final IndexInput input = openInput(file, CachedBlobContainerIndexInput.CACHE_WARMING_CONTEXT);


This method uses an IndexInput with a specific IOContext to prewarm the cache for the given Lucene file. The IndexInput will be cloned for each part to write in cache later and closed once all parts are processed.

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

tlrx · 2020-04-16T16:04:57Z

...plugin/searchable-snapshots/src/main/java/org/elasticsearch/index/store/cache/CacheFile.java

@@ -61,12 +61,11 @@ protected void closeInternal() {
    @Nullable // if evicted, or there are no listeners
    private volatile FileChannel channel;

-    public CacheFile(String description, long length, Path file, int rangeSize) {
+    public CacheFile(String description, long length, Path file) {


We included the rangeSize in the CacheFile to compute the range to fetch given a specific position, but we were never asserting that the fetched ranges really matched the size.

tlrx · 2020-04-16T16:06:04Z

...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java

@@ -144,6 +208,7 @@ private void writeCacheFile(FileChannel fc, long start, long end) throws IOExcep
        final long length = end - start;
        final byte[] copyBuffer = new byte[Math.toIntExact(Math.min(COPY_BUFFER_SIZE, length))];
        logger.trace(() -> new ParameterizedMessage("writing range [{}-{}] to cache file [{}]", start, end, cacheFileReference));
+        assert assertRangeOfBytesAlignment(start, end);


This asserts the size of the ranges written in cache, depending of the IOContext.

We can't really assert the size of ranges now warming runs concurrently. This has been removed.

...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java

ywelsch · 2020-04-20T09:29:09Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

+                @Override
+                protected void doRun() throws Exception {
+                    CheckedRunnable<Exception> loader;
+                    while (isOpen && (loader = queue.poll(0L, TimeUnit.MILLISECONDS)) != null) {


I see that this is taking the same approach as we use for uploading snapshot files (BlobStoreRepository). I would prefer not to hold onto workers for such a long time, as it can block the snapshot thread pool for a long time (cc: @original-brownbear).
In both cases (also the one BlobStoreRepository), I would prefer for the worker to process one file, then enqueue another task to the thread pool to pick up the next piece of work. This allows other operations to make progress as well, instead of waiting for a long time in the snapshot queue.

I agree, that makes sense Yannick.

I think for the uploading side the downside of

This allows other operations to make progress as well

is that it causes the index commits to be held on for a suboptimally long time. That's why the approach of fully monopolizing the pool was consciously chosen for uploads there.

That's something to be better controlled at the SnapshotShardsService level then, though. It could limit the number of shards to be snapshotted concurrently by lazily enqueuing there.

tlrx · 2020-04-21T08:41:47Z

Please hold on reviews - Yannick and I discussed several points on this PR yesterday and I'll address them.

tlrx · 2020-04-23T16:08:50Z

I've updated this PR so that the cache warming is not blocking anymore and now runs concurrently with the shard recovery. Allowing concurrent reads of different chunk sizes required to remove assertions on the number and the length of gaps to be written in cache (which I think is OK as David planned to improve this). I also introduced a dedicated thread pool for cache warming as suggested by Yannick. This thread pool is sized larger than the default snapshot thread pool.

I've quickly ran some benchmarks and compared the results to regular full restores. Depending of the snapshot to restore this change runs from 10% to 50% faster than a regular restore. This makes sense now contention have been reduced in #55662 and custom thread pool is used for warming.

tlrx · 2020-04-24T07:30:32Z

ML related failure

@elasticmachine run elasticsearch-ci/2

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

tlrx · 2020-04-24T08:35:43Z

@DaveCTurner sorry for the delay. This is ready for review.

DaveCTurner

Cunning use of an IOContext, makes sense to me. I left some small comments but nothing major.

DaveCTurner · 2020-04-24T10:52:45Z

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

@@ -99,6 +103,11 @@
        true,
        Setting.Property.IndexScope
    );
+    public static final Setting<Boolean> SNAPSHOT_CACHE_LOAD_EAGERLY_SETTING = Setting.boolSetting(
+        "index.store.snapshot.cache.load.eagerly",


Suggest keeping the terminology consistent around "warming", how about index.store.snapshot.cache.prewarm.enabled?

Much better name, thanks. I pushed 4ee96af.

DaveCTurner · 2020-04-24T10:55:42Z

...-snapshots/src/main/java/org/elasticsearch/index/store/BaseSearchableSnapshotIndexInput.java

@@ -129,7 +129,7 @@ protected InputStream openSlice(long slice) throws IOException {
        }
    }

-    protected final boolean assertCurrentThreadMayAccessBlobStore() {
+    protected boolean assertCurrentThreadMayAccessBlobStore() {


I think we can relax the assertion here to permit the searchable_snapshots threadpool to access the repo, rather than overloading it only in CachedBlobContainerIndexInput.

Ok. I pushed 0469c7b

DaveCTurner · 2020-04-24T10:59:13Z

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java


-        final BlobStoreIndexShardSnapshot.FileInfo fileInfo = fileInfo(name);
+    private IndexInput openInput(final BlobStoreIndexShardSnapshot.FileInfo fileInfo, final IOContext context) {


I think this is no longer necessary? The private openInput method is only called in one place.

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java

...plugin/searchable-snapshots/src/main/java/org/elasticsearch/index/store/cache/CacheFile.java

...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java

tlrx · 2020-04-24T14:11:40Z

Thanks @DaveCTurner, I've applied your feedback. This is ready for another round.

DaveCTurner

Two more tiny comments regarding the threadpool, but otherwise LGTM. Great work @tlrx.

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java

tlrx · 2020-04-24T15:48:00Z

Thanks David

Relates #50999

Today the cache prewarming introduced in #55322 works by enqueuing altogether the files parts to warm in the searchable_snapshots thread pool. In order to make this fairer among concurrent warmings, this commit starts workers that concurrently polls file parts to warm from a queue, warms the part and then immediately schedule another warming execution. This should leave more room for concurrent shard warming to sneak in and be executed. Relates #55322

Today the cache prewarming introduced in elastic#55322 works by enqueuing altogether the files parts to warm in the searchable_snapshots thread pool. In order to make this fairer among concurrent warmings, this commit starts workers that concurrently polls file parts to warm from a queue, warms the part and then immediately schedule another warming execution. This should leave more room for concurrent shard warming to sneak in and be executed. Relates elastic#55322

Today the cache prewarming introduced in #55322 works by enqueuing altogether the files parts to warm in the searchable_snapshots thread pool. In order to make this fairer among concurrent warmings, this commit starts workers that concurrently polls file parts to warm from a queue, warms the part and then immediately schedule another warming execution. This should leave more room for concurrent shard warming to sneak in and be executed. Relates #55322

Add cache prewarming

257a00a

tlrx added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.8.0 labels Apr 16, 2020

tlrx requested review from ywelsch and DaveCTurner April 16, 2020 15:57

tlrx commented Apr 16, 2020

View reviewed changes

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java Outdated Show resolved Hide resolved

tlrx commented Apr 16, 2020

View reviewed changes

...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java Show resolved Hide resolved

tlrx changed the title ~~Allow to prewarm that cache for searchable snapshot shards~~ Allow to prewarm the cache for searchable snapshot shards Apr 17, 2020

ywelsch reviewed Apr 20, 2020

View reviewed changes

tlrx added the WIP label Apr 21, 2020

tlrx added 2 commits April 23, 2020 14:48

Merge branch 'master' into load-cache-eagerly

4ab253a

apply feedback

9a67e89

ywelsch self-requested a review April 24, 2020 08:11

ywelsch reviewed Apr 24, 2020

View reviewed changes

...hable-snapshots/src/main/java/org/elasticsearch/index/store/SearchableSnapshotDirectory.java Show resolved Hide resolved

tlrx removed the WIP label Apr 24, 2020

DaveCTurner reviewed Apr 24, 2020

View reviewed changes

...apshots/src/main/java/org/elasticsearch/index/store/cache/CachedBlobContainerIndexInput.java Show resolved Hide resolved

tlrx added 7 commits April 24, 2020 14:13

remove unused openinput

c8a1c6b

add todo

a635c7f

rename setting

4ee96af

assert thread name

0469c7b

remove tuple

4dc76b8

remove unnecessary condition

e9831a3

assert range

e01bf71

tlrx requested a review from DaveCTurner April 24, 2020 14:11

DaveCTurner approved these changes Apr 24, 2020

View reviewed changes

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java Show resolved Hide resolved

...snapshots/src/main/java/org/elasticsearch/xpack/searchablesnapshots/SearchableSnapshots.java Outdated Show resolved Hide resolved

zero

baa8258

tlrx merged commit bd40d06 into elastic:master Apr 24, 2020

tlrx deleted the load-cache-eagerly branch April 24, 2020 15:47

tlrx added a commit that referenced this pull request Apr 24, 2020

Allow to prewarm the cache for searchable snapshot shards (#55322)

41ddbd4

Relates #50999

tlrx mentioned this pull request Apr 27, 2020

Use workers to warm cache parts #55793

Merged

tlrx mentioned this pull request May 5, 2020

Use workers to warm cache parts (#55793) #56181

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to prewarm the cache for searchable snapshot shards #55322

Allow to prewarm the cache for searchable snapshot shards #55322

tlrx commented Apr 16, 2020

elasticmachine commented Apr 16, 2020

tlrx Apr 16, 2020

tlrx Apr 16, 2020

ywelsch Apr 20, 2020

tlrx Apr 24, 2020

tlrx Apr 16, 2020

DaveCTurner Apr 24, 2020

tlrx Apr 24, 2020

tlrx Apr 16, 2020

tlrx Apr 16, 2020

tlrx Apr 16, 2020

tlrx Apr 24, 2020

ywelsch Apr 20, 2020

tlrx Apr 20, 2020

original-brownbear Apr 20, 2020

ywelsch Apr 20, 2020

tlrx commented Apr 21, 2020 •

edited

Loading

tlrx commented Apr 23, 2020

tlrx commented Apr 24, 2020

tlrx commented Apr 24, 2020

DaveCTurner left a comment

DaveCTurner Apr 24, 2020

tlrx Apr 24, 2020

DaveCTurner Apr 24, 2020

tlrx Apr 24, 2020

DaveCTurner Apr 24, 2020

tlrx commented Apr 24, 2020

DaveCTurner left a comment

tlrx commented Apr 24, 2020


		final BlobStoreIndexShardSnapshot.FileInfo fileInfo = fileInfo(name);
		private IndexInput openInput(final BlobStoreIndexShardSnapshot.FileInfo fileInfo, final IOContext context) {

Allow to prewarm the cache for searchable snapshot shards #55322

Allow to prewarm the cache for searchable snapshot shards #55322

Conversation

tlrx commented Apr 16, 2020

elasticmachine commented Apr 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Apr 21, 2020 • edited Loading

tlrx commented Apr 23, 2020

tlrx commented Apr 24, 2020

tlrx commented Apr 24, 2020

DaveCTurner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Apr 24, 2020

DaveCTurner left a comment

Choose a reason for hiding this comment

tlrx commented Apr 24, 2020

tlrx commented Apr 21, 2020 •

edited

Loading