Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test: ProcessEventInspectionsTest.testFindLastProcessInstance #202

Closed
Tracked by #189
remcowesterhoud opened this issue Feb 22, 2022 · 4 comments · Fixed by #218 or #234
Closed
Tracked by #189

Flaky test: ProcessEventInspectionsTest.testFindLastProcessInstance #202

remcowesterhoud opened this issue Feb 22, 2022 · 4 comments · Fixed by #218 or #234
Assignees

Comments

@remcowesterhoud
Copy link
Contributor

Description

It seems to be a timing issue. This process contains a timer and the test increases the time of the engine: Utilities.increaseTime(engine, Duration.ofDays(1));. Possibly the engine has not yet triggered the timer and started processing before we checked if it's idle. This means we consider the engine to be idle, even though it still has stuff to do (triggering the timer).

java.lang.AssertionError: 

Expecting Optional to contain a value but it was empty.
	at io.camunda.zeebe.process.test.qa.inspections.ProcessEventInspectionsTest.testFindLastProcessInstance(ProcessEventInspectionsTest.java:61)

Occurs with both extensions.

@remcowesterhoud
Copy link
Contributor Author

remcowesterhoud commented Mar 2, 2022

It seems the issue is not fully fixed by #218.

A second theory on why this could go wrong:

  1. We deploy the process
  2. A process gets created
  3. A timer gets created
  4. We increase the engine time

We don't wait for the engine to have reached an idle state between step 3 and 4. Could it be that the engine time is increased before the timer is created, giving the timer a different due date than expected?

The logs should show this. The records show:

[main] ERROR io.camunda.zeebe.process.test.extension.ZeebeProcessTestExtension - ===== Test failed!
[main] INFO io.camunda.zeebe.process.test.filters.logger.IncidentLogger - 
[main] INFO io.camunda.zeebe.process.test.filters.logger.RecordStreamLogger - 
The following records have been recorded during this test:
| COMMAND             DEPLOYMENT                         CREATE                        | (Processes: [timer-start-event-daily.bpmn])
| EVENT               PROCESS                            CREATED                       | (Process: timer-start-event-daily.bpmn)
| EVENT               TIMER                              CREATED                       | (Element id: timer), (Due date: Thu Mar 03 16:41:11 UTC 2022)
| EVENT               DEPLOYMENT                         CREATED                       | (Processes: [timer-start-event-daily.bpmn])
| EVENT               DEPLOYMENT                         FULLY_DISTRIBUTED             |

The workflow ended at: [INFO] Finished at: 2022-03-02T16:41:14Z

So this theory does not seem likely. The day difference makes sense, since we tell the engine to increase the time by a day.

@remcowesterhoud
Copy link
Contributor Author

@pihme I know you love flaky tests 🙃 Do you have any other idea on why this could go wrong?

@pihme
Copy link
Contributor

pihme commented Mar 3, 2022

Not really, no.

I am wondering whether our approach is flawed.

It seems our approach rests on records, but advancing the time produces no records by itself and the timing of when time increase has an effect is not deterministic.

One other approach could be:

engineCtrl.increatseTIme(ofDays(1));
Awaitility.await().until(assertThat(timer).hasBeenTriggered());

If this works better, we could also think about offering an interface like:

engineCtrl.increaseTimeToTriggerTimer(timerId);

Then in that method, we could check the current time, the timeout of the timer, increase the time by the delta, wait for that timer to get triggered, then return. This would also be a more convenient interface for users, because then they don't need to calculate the amount by which the time should be increased.

@remcowesterhoud remcowesterhoud self-assigned this Mar 3, 2022
@remcowesterhoud
Copy link
Contributor Author

I tried your proposed solution a few times using Awaitility, it doesn't seem to resolve the issue either. The test still failed 2/5 times 😞
https://github.com/camunda-cloud/zeebe-process-test/actions/runs/1927898251

This kinda worried me, because that would mean the timer would not have been triggered within 10 seconds. I'm starting to wonder if this could be a Zeebe thing, where it just takes a while before timers are triggered after the actor scheduler time has been increased.

Another thing that bothers me is that it's always this test, even though we have plenty of tests where we increase the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants