Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Too many get snapshot calls causing generic threadpool to be busy completely #1788

Closed
shwetathareja opened this issue Dec 22, 2021 · 12 comments · Fixed by #8377
Closed
Assignees
Labels
bug Something isn't working distributed framework

Comments

@shwetathareja
Copy link
Member

shwetathareja commented Dec 22, 2021

Describe the bug
Too many get snapshot calls for custom repo which has more than 70k snapshots caused generic threadpool to be exhausted completely. They were in indefinite wait state, looks like a deadlock.

Observed this behavior in 7.10 and 7.1 but the stack traces were different.

In 7.10 the code to fetch repository is under Future.get blocking call and the repositoriesService.getRepositoryData executes internal method in generic threadpool which was exhausted causing the deadlock
threadPool.generic().execute(ActionRunnable.wrap(listener, this::doGetRepositoryData));

This issue would exist in OpenSearch as well, haven't tried explicit repro yet.

In 7.1 though, there was no deadlock but the get snapshot calls were taking really long to finish keeping the whole threadpool busy.

Expected behavior
The TransportGetSnapshotAction shouldn't block wait on the
repositoriesService.getRepositoryData and move to async processing

Repository.getRepositoryData was made async in 0acba44 and blocking step for the
Blocking call was removed from snapshot status API TransportSnapshotsStatusAction here - 1cde4a6

Due to this, pending tasks were stuck for hours on master.

7.10

All the 128 thread of generic threadpool were busy waiting

"elasticsearch[dd229155bf9aa56a13617a858a541448][generic][T#234]" #6225 daemon prio=5 os_prio=0 cpu=308056.43ms elapsed=274126.86s tid=0x00007fd52424b000 nid=0x32ed waiting on condition  [0x00007fd33a257000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
	- parking to wait for  <0x00000004f9b75ec0> (a org.elasticsearch.common.util.concurrent.BaseFuture$Sync)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.6/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.6/AbstractQueuedSynchronizer.java:885)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.6/AbstractQueuedSynchronizer.java:1039)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.6/AbstractQueuedSynchronizer.java:1345)
	at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:259)
	at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:87)
	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:56)
	at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:37)
	at org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:33)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:114)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:67)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:100)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:173)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$$Lambda$3646/0x00000008018b2040.accept(Unknown Source)
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:752)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)

   Locked ownable synchronizers:
	- <0x00000004739d8ce0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

7.1

All 128 threads either were waiting for connection for S3 or running at the conscrypt library code.

"elasticsearch[e7e76d2dd2e0ed6731c77440058decee][generic][T#367]" #1145 daemon prio=5 os_prio=0 cpu=703963.76ms elapsed=48932.59s tid=0x00007f89f4294000 nid=0x5297 waiting on condition  [0x00007f894f7cb000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.6/Native Method)
	- parking to wait for  <0x00000005ccf7c690> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkUntil(java.base@11.0.6/LockSupport.java:275)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(java.base@11.0.6/AbstractQueuedSynchronizer.java:2166)
	at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:377)
	at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
	at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:245)
	- locked <0x000000058546b608> (a org.apache.http.pool.AbstractConnPool$2)
	at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:304)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:280)
	at jdk.internal.reflect.GeneratedMethodAccessor191.invoke(Unknown Source)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.6/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@11.0.6/Method.java:566)
	at com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
	at com.amazonaws.http.conn.$Proxy112.get(Unknown Source)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1323)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5054)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5000)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1486)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1341)
	at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$readBlob$1(S3BlobContainer.java:98)
	at org.elasticsearch.repositories.s3.S3BlobContainer$$Lambda$3012/0x0000000801697c40.run(Unknown Source)
	at java.security.AccessController.doPrivileged(java.base@11.0.6/Native Method)
	at org.elasticsearch.repositories.s3.SocketAccess.doPrivileged(SocketAccess.java:42)
	at org.elasticsearch.repositories.s3.S3BlobContainer.readBlob(S3BlobContainer.java:98)
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:101)
	at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:93)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:718)
	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:239)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:135)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:54)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:208)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:760)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
"elasticsearch[e7e76d2dd2e0ed6731c77440058decee][generic][T#368]" #1147 daemon prio=5 os_prio=0 cpu=702831.23ms elapsed=48912.59s tid=0x00007f89f4131800 nid=0x52ae runnable  [0x00007f8972ce9000]
   java.lang.Thread.State: RUNNABLE
	at org.conscrypt.NativeCrypto.SSL_read(Native Method)
	at org.conscrypt.NativeSsl.read(NativeSsl.java:409)
	at org.conscrypt.ConscryptFileDescriptorSocket$SSLInputStream.read(ConscryptFileDescriptorSocket.java:548)
	- locked <0x00000005a1a28020> (a java.lang.Object)
	at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
	at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
	at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1323)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1139)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:796)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:764)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:738)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:698)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:680)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:544)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:524)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5054)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5000)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1486)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1341)
	at org.elasticsearch.repositories.s3.S3BlobContainer.lambda$readBlob$1(S3BlobContainer.java:98)
	at org.elasticsearch.repositories.s3.S3BlobContainer$$Lambda$3012/0x0000000801697c40.run(Unknown Source)
	at java.security.AccessController.doPrivileged(java.base@11.0.6/Native Method)
	at org.elasticsearch.repositories.s3.SocketAccess.doPrivileged(SocketAccess.java:42)
	at org.elasticsearch.repositories.s3.S3BlobContainer.readBlob(S3BlobContainer.java:98)
	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:101)
	at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:93)
	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotInfo(BlobStoreRepository.java:718)
	at org.elasticsearch.snapshots.SnapshotsService.snapshots(SnapshotsService.java:239)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:135)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:54)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:208)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:760)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.6/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.6/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.6/Thread.java:834)
@shwetathareja shwetathareja added bug Something isn't working untriaged labels Dec 22, 2021
@shwetathareja
Copy link
Member Author

Also, we should support timeout and cancelling of get snapshot calls if they are taking more than the configure timeout.

@shwetathareja
Copy link
Member Author

During the issue, GET Snapshot API calls were taking ~1.5 hours

@xuezhou25
Copy link
Contributor

Is anyone currently working on this issue? Or I can take a look.

@shwetathareja
Copy link
Member Author

@xuezhou25: feel free to take a stab at it.

@shwetathareja
Copy link
Member Author

@xuezhou25

  1. we should look into making the code not to block on response from getRepositoryData here
    repositoryData = PlainActionFuture.get(fut -> repositoriesService.getRepositoryData(repository, fut));
    and pass a listener to it instead of this fut object.

You can check this PR 1cde4a6 on how this call was switched to async listener in TransportSnapshotsStatusAction class. code ref:

repositoriesService.getRepositoryData(repositoryName, repositoryDataListener);

  1. Above change wouldn't address the problem of getSnapshots taking really, for this we should look into 2 ways:
    a. Timing out the request
    b. Cancelling the long running request to free up the threads.

@piyushdaftary
Copy link
Contributor

Using the get snapshots API can be expensive as the API requires a read from repository for each index/shards in each snapshot.

Here the main problem I do see with long running get snapshots call is that it is blocking threads of GENERIC threadpool which in cascade impacting other functionality of OpenSearch cluster.

In my opinion to address the issue we should explore following 3 options:

  1. Limit the number of threads get snapshots API can use, so that it don't block other modues of OpenSearch using GENERIC threadpool
  2. Make the get snapshots api to use SNAPSHOTS threadpool instead of GENERIC
  3. Maintain a Metadeta file at repository level with all snapshots meta information, so that get snapshots operation is just a single file read operation.

Timeout/Cancellation can be good option in some cases, but it won't solve the purpose of those who want to get snapshot info of really large repository without impacting other functionality.

@shwetathareja @xuezhou25 : WDYT ?

@shwetathareja
Copy link
Member Author

@piyushdaftary : Right now there is a deadlock in the code due to blocking call. Even if you move to different "snapshot" threadpool, the blocking wait would still be there and in case snapshot threadpool is exhausted then those threads would remain waiting. This bug has to be fixed either ways.

Regarding having a different threadpool for snapshot. It is not critical operation, hence didn't suggest having a separate threadpool for it. The current "Snapshot" threadpool is used by RepositoryService and I dont think it is a good idea to club get snapshot API with that as it can effect actual snapshots being taken.

Also "generic" threadpool caters to most of the APIs and creating threadpool per APIs would not be a good decision.

@anasalkouz
Copy link
Member

@xuezhou25 are you still actively working on this issue?

@Bukhtawar
Copy link
Collaborator

Bukhtawar commented Apr 30, 2023

All cluster PUT settings calls are getting stuck due to this. The publication of the cluster state relies on this thread pool, so its critical we address this sooner.

Source  |  Priority  |  TimeInQueue 
put-mapping [team-owl-2023.04.25/ZJVmPnKfRq2xJG0NaTdYbA] | HIGH | 1.5d 
cluster_update_settings | IMMEDIATE | 16.9h 
cluster_update_settings | IMMEDIATE | 16.9h 
cluster_update_settings | IMMEDIATE | 16.9h 
cluster_update_settings | IMMEDIATE | 16.9h 
cluster_update_settings | IMMEDIATE | 16.8h 
cluster_update_settings | IMMEDIATE | 16.8h 
cluster_update_settings | IMMEDIATE | 16.7h 
cluster_update_settings | IMMEDIATE | 16.7h 
cluster_update_settings | IMMEDIATE | 16.7h 
cluster_update_settings | IMMEDIATE | 16.7h 
cluster_update_settings | IMMEDIATE | 16.6h 
cluster_update_settings | IMMEDIATE | 16.6h 
cluster_update_settings | IMMEDIATE | 16.6h 
cluster_update_settings | IMMEDIATE | 16.5h 
cluster_update_settings | IMMEDIATE | 16.5h 
cluster_update_settings | IMMEDIATE | 16.5h 
cluster_update_settings | IMMEDIATE | 16.4h 
cluster_update_settings | IMMEDIATE | 16.4h 
cluster_update_settings | IMMEDIATE | 16.4h 
cluster_update_settings | IMMEDIATE | 16.3h 
cluster_update_settings | IMMEDIATE | 16.3h 
cluster_update_settings | IMMEDIATE | 16.3h 
cluster_update_settings | IMMEDIATE | 16.2h 
cluster_update_settings | IMMEDIATE | 16.2h 
cluster_update_settings | IMMEDIATE | 16.2h 
cluster_update_settings | IMMEDIATE | 16.2h 
cluster_update_settings | IMMEDIATE | 16.1h 
cluster_update_settings | IMMEDIATE | 16.1h

The problem with this state is it can stall all write traffic if there is a dynamic mapping update

"elasticsearch[d2462efc415af6277ab11128322f721d][generic][T#764]" #154517 daemon prio=5 os_prio=0 cpu=229285.84ms elapsed=244877.47s tid=0x00007f1424727340 nid=0x28e waiting on condition  [0x00007f12f4996000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.17/Native Method)
	- parking to wait for  <0x0000000540505870> (a org.elasticsearch.common.util.concurrent.BaseFuture$Sync)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.17/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.17/AbstractQueuedSynchronizer.java:885)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.17/AbstractQueuedSynchronizer.java:1039)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.17/AbstractQueuedSynchronizer.java:1345)
	at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:259)
	at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:87)
	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:56)
	at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:37)
	at org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:33)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:114)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:67)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:100)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:173)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$$Lambda$3847/0x0000000801915040.accept(Unknown Source)
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:752)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.17/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.17/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.17/Thread.java:829)

   Locked ownable synchronizers:
	- <0x0000000404a1e888> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"elasticsearch[d2462efc415af6277ab11128322f721d][generic][T#769]" #154522 daemon prio=5 os_prio=0 cpu=226656.17ms elapsed=244877.47s tid=0x00007f13dc4c8290 nid=0x291 waiting on condition  [0x00007f12afbfa000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.17/Native Method)
	- parking to wait for  <0x000000054050ff18> (a org.elasticsearch.common.util.concurrent.BaseFuture$Sync)
	at java.util.concurrent.locks.LockSupport.park(java.base@11.0.17/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11.0.17/AbstractQueuedSynchronizer.java:885)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(java.base@11.0.17/AbstractQueuedSynchronizer.java:1039)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(java.base@11.0.17/AbstractQueuedSynchronizer.java:1345)
	at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:259)
	at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:87)
	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:56)
	at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:37)
	at org.elasticsearch.action.support.PlainActionFuture.get(PlainActionFuture.java:33)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:114)
	at org.elasticsearch.action.admin.cluster.snapshots.get.TransportGetSnapshotsAction.masterOperation(TransportGetSnapshotsAction.java:67)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:100)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.lambda$doStart$3(TransportMasterNodeAction.java:173)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$$Lambda$3847/0x0000000801915040.accept(Unknown Source)
	at org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:752)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.17/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.17/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.17/Thread.java:829)

   Locked ownable synchronizers:
	- <0x0000000404a1e8e8> (a java.util.concurrent.ThreadPoolExecutor$Worker)

@amkhar
Copy link
Contributor

amkhar commented Jun 1, 2023

Starting points to explore :
Code paths taking too much of CPU/memory.

getRepositoryData and inside it RepositoryData.snapshotsFromXContent


cacheRepositoryData(
                        BytesReference.bytes(loaded.snapshotsToXContent(XContentFactory.jsonBuilder(), Version.CURRENT)),
                        genToLoad
                    );

Note : this analysis is based on a setup with 100 node cluster, 4K indices and 400 snapshots.

Attaching flame graph for the same.

cached-code-steady-state-alloc-26-c1.txt
cached-code-steady-state-cpu-26-c1.txt

@indrajohn7
Copy link
Contributor

Had an initial deep dive on this issue, below are a few findings:

  1. This code path: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java#L1610 is iterating over the index-N snapshot objects in the blob repo to retrieve the latest generation ID of the snapshot. This keep on happening when the latest generation throws exception while performing an I/O and the loop continues. If there are incremental updates in the latest snapshot repo gen and if the repo is deleted or corrupted in an incremental fashion, then the while loop is repeated indefinitely and the GENERIC threadpool would be stuck.

  2. Moving this to a listener based async call: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java#L143.

  • Currently the above code path is blocking the GENERIC threadpool.
  1. Load the repository data: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java#L1638
  • The snapshot Xcontent parsing takes most of the CPU samples.
  1. Cache the repository data: https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java#L1642
  1. Can the doGetRepositoData call move to the SNAPSHOT Threadpool in stead of the GENERIC one?

@indrajohn7
Copy link
Contributor

indrajohn7 commented Jun 9, 2023

Reproduction steps:

  1. 3 dedicated master with 2 data node cluster.
  2. 500K shards with 1000 indices.
  3. Too many concurrent get snapshot calls.
  4. Flamegraph suggests that 27% CPU samples are occupied in get_snapshot code path.
  5. 16% samples are from BlobStoreRepository .doGetRepositoryData(): https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/repositories/blobstore/BlobStoreRepository.java#L1598
  6. 11% samples are from TransportGetSnapshotsAction.snapshots() : https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java#L180
  • There are ~5 - 10k pending tasks in the queue at the same time, while the snapshot and index creation are processing in parallel.
  • 11709069 7.1m NORMAL update snapshot state 11709070 7.1m NORMAL update snapshot state 11709071 7.1m NORMAL update snapshot state 11709072 7.1m NORMAL update snapshot state 11709073 7.1m NORMAL update snapshot state 11706573 7.1m NORMAL update snapshot state 11705322 7.1m NORMAL update snapshot state 11706574 7.1m NORMAL update snapshot state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment