Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent ReplayMerges fail #1340

Closed
jrsala-auguration opened this issue Jul 14, 2022 · 3 comments
Closed

Concurrent ReplayMerges fail #1340

jrsala-auguration opened this issue Jul 14, 2022 · 3 comments

Comments

@jrsala-auguration
Copy link
Contributor

jrsala-auguration commented Jul 14, 2022

Function ReplayMerge#pollForResponse is called whenever the ReplayMerge awaits a control response from the archive, namely when it tries to start the replay or when it requests the current recording position. It returns true if a response to the currently active request has been received, false otherwise.

To do so, it polls the archive.controlResponsePoller() and, once that has received a full control response, checks the correlationId of that response. If the correlationId is not the expected activeCorrelationId, it simply moves on and returns false.

This prevents using several ReplayMerges in an interleaved fashion in a single duty cycle, because control responses intended for one ReplayMerge might be discarded by another when it calls pollForResponse. As a consequence, only one of the ReplayMerges will ever finish the ReplayMerges often will never finish. Please do tell me if I've misunderstood anything so far.

It seems to me that this very issue is why the more "naïve" parts of the AeronArchive's interface use a global lock, making the operations slower because synchronous but correct. For a ReplayMerge which could be a long operation however that is not possible.

I have circumvented this issue of stolen control responses by modifying ReplayMerge to instead use a ControlResponseDispatcher that I wrote that stores Runnable callbacks associated to correlation IDs, wraps the original ControlResponsePoller and calls the appropriate callback (if any) when a control response is received. However I think this mechanism should be natively integrated to the ControlResponsePoller. Eager to read your thoughts on this.

mjpt777 added a commit that referenced this issue Jul 15, 2022
…at it should have its own Archive client. Issue #1340.
@mjpt777
Copy link
Contributor

mjpt777 commented Jul 15, 2022

ReplayMerge is not meant for concurrent use plus it is often in the hot path when being used. If you need multiple replay merge operations then provide separate AeronArchive sessions. I've updated the Javadoc to call this out.

@mjpt777 mjpt777 closed this as completed Jul 15, 2022
@jrsala-auguration
Copy link
Contributor Author

You added a comment to the javadoc saying "ReplayMerge is not threadsafe and should not be used with a shared AeronArchive client" but I did not intend to refer to thread safety or multi-threaded use. As written in the issue, this is about

using several ReplayMerges in an interleaved fashion in a single duty cycle

My mention of the locking in AeronArchive was confusing though. What I really meant is that the operations in the simplistic AeronArchive interface are fully synchronous (even on a single thread) at the cost of lost opportunity for concurrency, and that's OK because you can always use the lower-level ArchiveProxy. On the other hand, the interface of ReplayMerge looks like it will support concurrent (single-threaded) use by polling several ReplayMerge objects in turn, but it really doesn't because it misuses the ControlResponsePoller.

Having to use several AeronArchive sessions when one would suffice seems unnecessary and counter-intuitive. Speaking of the hot path, an extra AeronArchive might even hurt cache locality due to the presence of an extra ArchiveProxy with all the buffers that come with it.

The ArchiveConductor uses a Long2ObjectHashMap<ControlSession> in the ControlSessionDemuxer. I'm pretty sure an analogue could be used on the client side of the interaction.

@mjpt777
Copy link
Contributor

mjpt777 commented Jul 18, 2022

If you think you can achieve this in a consistent way with the API preserved then feel free to submit a PR. However be aware that few would think they are on the hot path when catching up with a replay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants