-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: simulated time block on sleep #10551
test: simulated time block on sleep #10551
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…s in FakeUpstream Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…which seems to work for the cache test. Signed-off-by: Joshua Marantz <jmarantz@google.com>
failure in asan on IpVersions/ProxyFilterIntegrationTest.UpstreamTls/IPv4 in //test/extensions/filters/http/dynamic_forward_proxy:proxy_filter_integration_test
|
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…p on 0ms real time for now Signed-off-by: Joshua Marantz <jmarantz@google.com>
/azp run |
Azure Pipelines successfully started running 1 pipeline(s), but failed to run 2 pipeline(s). |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is awesome. A few small comments and I will definitely like @alyssawilk to do a review also if possible. Also, can you try to do a big --runs_per_test
run over all the tests that use simulated time to make sure we don't have any flakes? It should be all the tests you modified I guess.
/wait
* Advances time forward by the specified duration. Timers may be triggered on | ||
* their threads, but unlike advanceTimeWait(), this method does not block | ||
* waiting for them to complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some text to each function as to why a test writer would want to use each one? It may not be immediately clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: added:
* This function should be used in multi-threaded tests, where other
* threads are running dispatcher loops. Integration tests should usually
* use this variant.
and
* This function should be used in single-threaded tests, in scenarios where
* after time is advanced, the main test thread will run a dispatcher
* loop. Unit tests will often use this variant.
// This real-time polling delay should not be necessary. But without it, | ||
// /test/extensions/filters/http/cache:cache_filter_integration_test fails | ||
// about 40% of the time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- referencing #10568 . Changed.
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Running all tests that referenced SimulatedTime with tsan and runs_per_test=100. Will check back tomorrow. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
All the tests that mention SimulatedTime (which covers more than what was modified in this PR) passed with tsan and --runs_per_test=100. There are still races (where there is still real-time there are races) I believe but they didn't come out in that run. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@alyssawilk can you take another look? thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly trusting Matt on what's called where, so only two remaining nits. The comments are a huge help, thanks.
As said on slack I'd love to see more than 100 runs since flakes generally show up 1:1000 or less.
As a random aside, the only_one_thread naming makes me sad because the function which is supposed to be called by multi-threaded tests checks "only one thread" and I had to spelunk in order to figure out why that was Ok.
const Duration real_time_poll_delay( | ||
std::min(std::chrono::duration_cast<Duration>(std::chrono::milliseconds(50)), duration)); | ||
const MonotonicTime end_time = monotonicTime() + duration; | ||
|
||
while (true) { | ||
bool cont = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better naming please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to timeout_not_reached
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you forget to upload? I'm not seeing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope; I thought I'd try intentionally not uploading until you were fine with my proposed changes, to avoid wasted CI churn :)
if (alarms_.empty()) { | ||
// If no alarms are pending, sleep till the end time. | ||
setMonotonicTimeAndUnlock(end_time); | ||
mutex.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you comment on the unusual lock pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment:
// This function runs with the caller-provided mutex held. We need to
// hold this->mutex_ while accessing the timer-queue and blocking on
// callbacks completing. To avoid potential deadlock we must drop
// the caller's mutex before taking ours. We also must care to avoid
// break/continue/return/throw during this non-RAII lock operation.
+1 to change this and make it more clear. I had to look it up also. |
I made some renaming comment changes per suggestion. PTAL just at the comments; won't push till we finish iterating on the wording. I also successfully ran 1000 iters of tsan on 67 tests that mentioned SimulatedTime. That took 19 hours. I have not done anything about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay on tests, thanks for the overnight run!
Also thanks for picking up the rename - I'd totally prefer a follow-up so you're good there.
const Duration real_time_poll_delay( | ||
std::min(std::chrono::duration_cast<Duration>(std::chrono::milliseconds(50)), duration)); | ||
const MonotonicTime end_time = monotonicTime() + duration; | ||
|
||
while (true) { | ||
bool cont = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you forget to upload? I'm not seeing it?
oh bleh, you even said, but I only saw the last email. yeah SGTM. |
Also fwiw this is all I had to say - headed afk to see if I can rest my headache away but I'm fine with Matt merging if he signs off else I'll TAL when I'm feeling better |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
@mattklein123 you want to take a final look (maybe just at the local-var name-change and comment?) |
Make unit tests to use advanceTimeWait in favor of sleep (see envoyproxy/envoy#10551) Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Make unit tests to use advanceTimeWait in favor of sleep (see envoyproxy/envoy#10551) Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Description: SimulatedTimeSystem::sleep() would advance time to trigger firing of timers, but does block on the threads executing those timers completing. That is the behavior does not match real-time sleep(), making it hard to retarget existing integration tests from real to simulated time.
However, there were some unit tests that are not multi-threaded, and used simulated time sleep in fashion where the timer callbacks would run on the same thread, and these need to retain the existing behavior. For these, a new API advanceTime() is now added, which enables callbacks but does not block waiting for them to complete.
Note that this PR does not yet solve all issues with sim-time in integration tests (#10568), as the infrastructure still relies on real-time sleeps in ways that are not fully understood. But it is a step in that direction.
Risk Level: low -- this is is a test/...-only PR.
Testing: /test/... in normal mode, and with tsan.
Docs Changes: n/a
Release Notes: n/a
Fixes: #10567