-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add summary diagnostics to spec test gen #2926
Conversation
26ad566
to
fb4a848
Compare
fb4a848
to
529ce30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 👍
could you share the diagnostics.json
result file of a full testgen?
@@ -216,6 +227,28 @@ def output_part(out_kind: str, name: str, fn: Callable[[Path, ], None]): | |||
if span > TIME_THRESHOLD_TO_PRINT: | |||
summary_message += f" in {span} seconds" | |||
print(summary_message) | |||
diagnostics = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have an assertion to verify if generated_test_count + test_identifiers == collected_test_count
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this doesn't hurt but I am also not convinced these assertions hold -- in particular I was considering this for the skipped test count and essentially collected == generated + skipped
did NOT hold... haven't looked more into it beyond that
{
"collected_test_count": 17909,
"generated_test_count": 17646,
"skipped_test_count": 255,
"durations": [
"0.8 seconds",
"1.39 seconds",
"36.66 seconds",
"43.24 seconds",
"56.74 seconds",
"76.23 seconds",
"83.08 seconds",
"45.07 seconds",
"97.03 seconds",
"113.93 seconds",
"60.32 seconds",
"138.4 seconds",
"167.47 seconds",
"65.53 seconds",
"0.01 seconds",
"0.01 seconds",
"0.01 seconds",
"166.4 seconds",
"128.07 seconds",
"136.8 seconds",
"124.1 seconds",
"123.87 seconds",
"138.34 seconds",
"229.12 seconds",
"431.12 seconds",
"504.37 seconds",
"517.68 seconds",
"411.34 seconds",
"262.53 seconds",
"217.94 seconds",
"343.05 seconds",
"389.64 seconds",
"412.88 seconds",
"492.91 seconds",
"673.31 seconds",
"612.24 seconds",
"901.78 seconds",
"521.62 seconds",
"595.08 seconds",
"956.12 seconds",
"650.26 seconds",
"41.64 seconds",
"601.12 seconds",
"1145.61 seconds",
"81.86 seconds",
"1160.32 seconds",
"90.79 seconds",
"762.62 seconds",
"899.42 seconds",
"898.18 seconds",
"1224.85 seconds",
"1712.58 seconds",
"1199.03 seconds",
"917.17 seconds",
"3987.39 seconds",
"4047.06 seconds",
"1065.26 seconds",
"2003.51 seconds",
"5823.07 seconds",
"1738.97 seconds",
"3333.55 seconds",
"87.37 seconds",
"42.55 seconds",
"62.76 seconds",
"55.19 seconds",
"2.55 seconds",
"1.82 seconds",
"1329.05 seconds",
"1248.06 seconds",
"0.0 seconds",
"15.32 seconds"
]
} I have omitted the here is a sample of the output: "mainnet::bellatrix::transition::core::pyspec_tests::transition_with_leaking_pre_fork",
"minimal::altair::transition::core::pyspec_tests::transition_with_one_fourth_slashed_active_validators_pre_fork",
"minimal::bellatrix::transition::core::pyspec_tests::transition_with_one_fourth_slashed_active_validators_pre_fork",
"minimal::altair::transition::core::pyspec_tests::transition_with_attester_slashing_right_after_fork",
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_generator
is a bit long of a function 😅
we could probably break it up into a number of better-named internal functions to define the steps more clearly, but don't need to do so now.
lgtm
Fixes #2919.
Due to the current architecture of the spec test gen, it was most straightforward to simply write the stats for each generator to a local file.
Remaining todo: