-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite compilation spends a large amount of time in evaluate_obligation
#87012
Comments
How many tests does the testsuite contain? Is it doctests? tests in the Can you isolate the behaviour to a lone test that if you remove it, the time spent in |
Without looking into this for more than 30 seconds, I'd expect this to be the issue. |
For context, |
I went ahead and commented each test out in chunks until they were all commented out and saw no meaningful change, which means the slowdown is somewhere in the test harness code rather than the tests themselves. Digging further, I managed to find that it's likely due to a large number of of For reference, the trait looks like this: #[async_trait]
pub trait TestOperation: Debug {
async fn execute_on_collection(
&self,
collection: &Collection<Document>,
session: Option<&mut ClientSession>,
) -> Result<Option<Bson>>;
async fn execute_on_database(
&self,
database: &Database,
session: Option<&mut ClientSession>,
) -> Result<Option<Bson>>;
async fn execute_on_client(&self, client: &TestClient) -> Result<Option<Bson>>;
async fn execute_on_session(&self, session: &mut ClientSession) -> Result<Option<Bson>>;
} And we have maybe 60 implementations of that trait.
self-profile output after commenting out:
So does this seem like a bug in the compiler then? in |
I'm happy to report that converting the edit: per @dtolnay, this is still likely a rustc issue, not an |
So we have this issue over at https://github.com/influxdata/influxdb_iox as well. Here is a (non-minimal) reproducible example: $ git clone https://github.com/influxdata/influxdb_iox.git
$ cd influxdb_iox
$ git checkout 61ccdbba93b3908ca80ef1ecce8a0e96d89ef1f0
$ RUSTFLAGS="-Zself-profile -Zself-profile-events=default,args" cargo +nightly build
$ cd ..
$ summarize summarize query_tests-<PID>.mm_profdata I've also followed https://blog.rust-lang.org/inside-rust/2020/02/25/intro-rustc-self-profile.html and created a chrome profile: https://we.tl/t-dqgMlqYWHI (can re-upload somewhere else on request) There you can see the detailed calls to
I'm not really sure how to read this, but could it be that the compiler is quite busy figuring out if a future is |
It would be interesting to get a self contained example that only uses the async-trait crate, no other dependencies. Is that possible? |
I've tried, but was so far not able to reproduce it on a smaller scale. I THINK it's not just about the number of async-trait |
@crepererum yeah I thought about something like a crate that autogenerates a lot of async-trait impls. |
I think I have a somewhat usable example now: https://github.com/crepererum/rust_issue_87012/tree/ee51fa3d07438c0b065df39a961698118a75aad2 It's not perfect since the effect could be larger but the effect is clearly visible, esp. considering that the time |
Hi, I'm investigating this issue further and I'd like to work on fixing it, but I'm not sure where to start on the fix. I've taken @crepererum 's example and tried various reductions on it, with the following results: At main commit https://github.com/crepererum/rust_issue_87012/tree/ee51fa3d07438c0b065df39a961698118a75aad2, running In the first commit on my In integer32llc/rust_issue_87012@bf24fd9 I removed the clippy In integer32llc/rust_issue_87012@cbd1831, I changed the body of all the trait implementation functions to call I reverted that and tried removing the I reverted that and tried removing the lifetime annotation bounds: integer32llc/rust_issue_87012@df4daad the problem no longer occurred: 2ms/1%. I tried removing the body of the So it appears to me that this problem needs all 3 of:
Additionally, looking at the chrome profiles of cases that exhibit the problem, it looks like the issue is both that In cases when the problem isn't present, I'm not sure what experiments would be helpful to do next. Does it seem like it's the multiple calls to Does anyone have any intuition about what and where the underlying issue might be? |
Personally I'd try working on removing the This will then hopefully help me gain enough knowledge that upon studying the code I can identify potential sources for the slowness. It would also be interesting to try to find out what arguments |
@est31 I think trait Trait {
#[must_use]
fn f<'life0, 'async_trait>(
&'life0 self,
) -> ::core::pin::Pin<
Box<dyn ::core::future::Future<Output = ()> + ::core::marker::Send + 'async_trait>,
>
where
'life0: 'async_trait,
Self: 'async_trait;
}
pub struct S0;
impl Trait for S0 {
fn f<'life0, 'async_trait>(
&'life0 self,
) -> ::core::pin::Pin<
Box<dyn ::core::future::Future<Output = ()> + ::core::marker::Send + 'async_trait>,
>
where
'life0: 'async_trait,
Self: 'async_trait,
{
Box::pin(async move {
let __self = self;
let _: () = {
helper_crate::entry_point().await;
};
})
}
}
// Repeat this for S1...Sn There's also a |
@crepererum good point, I haven't checked her repo. I think there are several ways to narrow the problem down further. First would be to find out the complexity function, that is what happens if you add M new items to N already existing ones. If M = N for example, and the run time doubles, you know that it's linear in the N variable. If the run time quadruples, you know it's a quadratic function. If it is 8 times, it's a cubic function, etc. If the run time increases by a lot, it might be exponential. For that, try adding 1 or 2 cases a few times, while measuring it at each step, and check whether the time gets multiplied by a constant each time. The complexity can be a mix of multiple functions, so it might not always be 100% clear, but a trend can show up. Instead of doubling you could do a 5 fold increase, which if the largest power is 2, gives you a 25 fold increase in time, or a 125 if the largest power is 3. Why is the function interesting? Because it tells you something about the structure of the code that causes the slowness. If it's linearly increasing, it's likely not the main cause, unless the number is really large, but instead it's just invoking something moderately slow many times. If it's quadratically increasing, you have nested loops of level 2. If it's cubically increasing, the nesting level is 3. If it's exponentially increasing, it's likely some search problem, that is something where you have code trying out all combinations of some possible values to try to make them fit. Why is this information helpful? Because it tells you for which code to look for. Furthermore, if the problem can be nicely tuned to become really bad, the offending code paths will show up in the profiles. |
@est31 Just FYI, I tried to give a summary of what was contained in the repo in my comment where I said:
Also @wesleywiser pointed me to #89831 that changes some of the I will try varying the number of trait implementations, the number of parameters/lifetime constraints, and perhaps the number of async functions that are called in Thank you! |
Ok, I've now written scripts that do 3 experiments:
They all appear to be linear, BUT the number of nested async fn calls has by far the largest multiplier. By the time the script gets to 22 nested async functions, the compiler hits the recursion limit when compiling the If I add
This overflow error can be reproduced by checking out integer32llc/rust_issue_87012@d2a34c0 and running There are a number of nested async performance issues I've found that I'd suspect are related:
but none of those mention Next, I'm going to try get the arguments |
In chatting with @nnethercote, I made a version that reproduces the problem with only one crate, in case compiling a workspace is difficult to do with profiling tools. That version is here: https://github.com/integer32llc/rust_issue_87012/tree/one-crate and in this comment I was using the code at 9740ea40b19f76f35ef7da836712564ff8adffc3. I started with 69ac533 and added then in
and there are some particularly wordy and deep groups of log messages, one of which starts with:
and continues such that the depth of the vertical pipes wraps around my terminal width... here's ~1800 lines that's incomplete but gives an idea: https://gist.github.com/carols10cents/58a4fcf4ab4e1483ad5b4e4f5c01b0f8 There are messages in there that look troubling to me, like "CACHE MISS" and "rollback", but I don't actually know if that's expected or not. If there are ways to filter these logs or interesting bits in these logs that would be helpful, please let me know and I'm happy to get that info. |
Can you see if #92044 improves performance for you? |
@Aaron1011 It does fix it!!!! I commented on the PR, thank you so much!!1 |
By passing in a `&mut Vec` in various places and appending to it, rather than building many tiny `Vec`'s and then combining them. This removes about 20% of the allocations occurred in a `check` build of a test program from rust-lang#87012, making it roughly 2% faster.
I originally posted this on the Rust forums but was directed here since the cause could be related to a compiler bug.
I'm looking for some help debugging why the compile times in the
mongodb
crate's test suite are much slower than the crate itself. For reference, a single incremental change can take a minute or longer, even when running justcargo check --tests
. The crate itself compiles quickly--the issue is only with the tests.The output of compiling with
-Zself-profile
indicates that nearly the entire time (~1m) is spent inevaluate_obligation
, but I can't figure what this implies or how to possibly reduce this.A few possible culprits that I've investigated so far:
#[cfg_attr(...)]
to conditionally select either#[tokio::test]
or#[async_std::test]
. Removing all usages of this and defaulting to just#[tokio::test]
has an insignificant impact on compile timestyped-builder
crate. Perhaps using them in the tests leads to long compilation times due to the generated builders having lots of generic parameters.async
and compilation times, though someone posted their profile output andevaluate_obligation
is not a factor in it.src/test
rather thantests
, not sure if this has any impactAnyways, I'm not sure where to go from here, and I was wondering if anyone else has encountered similar issues in the past or knows how I could further debug this to identify the root cause. Thanks!
Here are the top few results from the self-profile output:
The text was updated successfully, but these errors were encountered: