-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client retries failed jobs after logging out/in without restart #1420
Comments
(I've not tagged this QA because it does not appear to be a recent regression.) |
I should note that I've observed at least two other outcomes when testing this repeatedly:
Which is to say that you may have to follow the STR repeatedly to see this exact result, and there may be multiple possible race condition outcomes resulting from an unclean state at the time of (re-)login. This may also help explain the inconsistent results reported by Ro on #1415 (comment) |
Environment: servers @ 2.1.0 Confirming that the STR result in sent replies upon reconnection; here are more details.
Update: I just tested this by sending replies every 5 seconds for 30-35 seconds while disconnected from the network, then logging out of the client. When reconnecting the network, I was not yet signed in, but I watched all the failed replies get sent.
I did not observe the mixed UI state (failed indicator + signed-in user colour), or the 'session expired' error, but I also think this whole area needs more testing, so I'm going to pause on trying to repro for now and we can look at all the edge cases in more detail and more methodically. |
#1429 is also high priority, and both will need to be added in the next release. |
A couple of observations/notes that may be of help:
|
@rocodes, thanks for this simpler reproduction, which checks out for me too. I'll dig into the job-queue logic tomorrow. |
Based on discussion last week with @creviera and @rocodes, and the above steps to reproduce, I can confirm that this behavior is a side effect, or rather a non-effect, of the following securedrop-client/securedrop_client/logic.py Lines 776 to 777 in 503e5d1
We'd asked if it's appropriate for logout to be handled synchronously, outside of the --- a/tests/test_queue.py
+++ b/tests/test_queue.py
@@ -8,6 +8,7 @@ from sdclientapi import RequestTimeoutError, ServerConnectionError
from securedrop_client.api_jobs.base import ApiInaccessibleError, PauseQueueJob
from securedrop_client.api_jobs.downloads import FileDownloadJob, MessageDownloadJob
+from securedrop_client.api_jobs.uploads import SendReplyJob
from securedrop_client.queue import ApiJobQueue, RunnableQueue
from tests import factory
@@ -442,6 +443,20 @@ def test_ApiJobQueue_stop_stops_queue_threads(mocker):
assert not job_queue.main_thread.isRunning()
assert not job_queue.download_file_thread.isRunning()
+def test_ApiJobQueue_stop_clears_jobs(mocker):
+ mock_api = mocker.MagicMock()
+ mock_client = mocker.MagicMock()
+ mock_session_maker = mocker.MagicMock()
+
+ job_queue = ApiJobQueue(mock_client, mock_session_maker)
+ job_queue.start(mock_api)
+
+ job = SendReplyJob("mock", "mock", "mock", "mock")
+ job_queue.enqueue(job)
+ assert job_queue.main_queue.queue.qsize() == 1
+
+ job_queue.stop()
+ assert job_queue.main_queue.queue.empty() Next I'll investigate whether we ever expect |
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
Correction: Currently logout is handled asynchronously outside of We might want to consider using a separate
In the future, we might consider adding a feature that asks the users if they want to finish processing jobs before logout, but for now, assume we will not want to finish jobs in the queue or resume them later. |
I sent ^ before a meeting, so now I'll complete my thought here. I agree with your intuition about making the logout operation a synchronous action. The only reason I can see that you might want to add another |
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
Thanks, @creviera! Out of band yesterday we discussed a three-pronged approach to this bug:
(1) is drafted in |
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
Per discussion with @creviera, next steps here after #1434:
|
Another thing to consider is firing off a logout |
New findings on the higher-level approach:
However:
I think (2) points to a self-sufficient fix for the original bug here, if we can clear each |
Incremental progress today:
Clarification: Example: After the
|
Nice investigation work! This checks out since securedrop-client/securedrop_client/queue.py Lines 37 to 40 in 503e5d1
I do like your approach with starting out at a higher level, and just clearing the queue on logout. This would get rid of an
👀 |
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
To summarize recent work and discussions: #1434 is ready for review to fix this behavior specifically. We're separately reevaluating the SecureDrop API's authentication flow in light of the "failed logout" symptom, but we won't address that here. |
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
In #1434 (comment), @creviera and I have concluded that we're so far unable to fix this bug directly in
That we can't come up with a self-contained patch for this behavior, that works consistently across network-level failure-modes, suggests to us that we've encountered a corner case along the edges of the Client's current handling of network, session, sync, and queue states. We're therefore going to take this bug as our cue (queue ;-) to zoom out, define carefully how we expect the application to behave across these state machines, and then return to code-level work from that understanding. |
@cfm this doesn't fix the underlying issue, but there is a bug where the following should be called outside of the conditional in order to ensure that we update the securedrop-client/securedrop_client/logic.py Lines 773 to 775 in f903cc3
should be: if self.api is not None:
self.call_api(self.api.logout, self.on_logout_success, self.on_logout_failure)
self.invalidate_token() |
FWIW, a simpler version of this STR: Quick STR
ExpectedFor the draft reply to remain in a failed status ActualThe failed reply is resent Notes
|
I can't seem to find related figma prototypes that show the retry (not refresh! whoops) button, but I recall nina doing some work around this. I'll continue the prototype hunt tomorrow. |
@creviera hypothesized today that the |
See #1486 (comment) and its test plan for clarification versus #1457. |
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
This test case currently fails by design, as a reproduction of the low-level behavior responsible for bug #1420.
Description
Tested against 2.1.0 server with:
Steps to Reproduce
Expected Behavior
Actual Behavior
Logs
https://gist.github.com/eloquence/b502702021bf1bfd10099fb778ba3859 (0.5.1)
The text was updated successfully, but these errors were encountered: