Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc never finishes after emitting error diagnostic #116996

Open
djkoloski opened this issue Oct 20, 2023 · 6 comments
Open

rustc never finishes after emitting error diagnostic #116996

djkoloski opened this issue Oct 20, 2023 · 6 comments
Labels
C-bug Category: This is a bug. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@djkoloski
Copy link
Contributor

djkoloski commented Oct 20, 2023

This issue was identified in Fuchsia. We're were unable to move this issue out of tree for reproduction.

Adding a relatively innocuous HashMap<u32, String> field to an existing struct causes the compiler to hang forever:

pub struct Kernel {
    pub new_map: HashMap<u32, String>,
    /// The Zircon job object that holds the processes running in this kernel.
    pub job: zx::Job,

Right before it hangs, it prints:

error[E0412]: cannot find type `HashMap` in this scope
  --> ../../src/starnix/kernel/task/kernel.rs:61:18
   |
61 |     pub new_map: HashMap<u32, String>,
   |                  ^^^^^^^ not found in this scope
   |
help: consider importing this struct
   |
5  + use std::collections::HashMap;
   |

The trailing pipe seems to suggest that some additional diagnostic information is intended to be printed. Adding use std::collections::HashMap; to the file fixes the error and causes the compilation to finish normally.

By attaching to a debug build with gdb, we were able to capture a full call stack. We noticed that the call stack contained many repetitions of this particular set of frames:

#92 rustc_trait_selection::traits::select::{impl#2}::evaluate_candidate::{closure#0} () at compiler/rustc_trait_selection/src/traits/select/mod.rs:1307
#93 rustc_trait_selection::traits::select::SelectionContext::evaluate_candidate (self=0x7f5461f66208, stack=0x7f5461eeda70, candidate=0x7f5461ee9270) at compiler/rustc_trait_selection/src/traits/select/mod.rs:1296
#94 0x00007f546b026c16 in rustc_trait_selection::traits::select::SelectionContext::evaluate_stack (self=0x7f5461f66208, stack=0x7f5461eeda70) at compiler/rustc_trait_selection/src/traits/select/mod.rs:1270
#95 0x00007f546ae96301 in rustc_trait_selection::traits::select::{impl#2}::evaluate_trait_predicate_recursively::{closure#0}::{closure#1} (this=0x7f5461f66208) at compiler/rustc_trait_selection/src/traits/select/mod.rs:1092
#96 rustc_trait_selection::traits::select::{impl#2}::in_task::{closure#0}<rustc_trait_selection::traits::select::{impl#2}::evaluate_trait_predicate_recursively::{closure#0}::{closure_env#1}, core::result::Result<rustc_middle::traits::select::EvaluationResult, rustc_middle::traits::select::OverflowError>> () at compiler/rustc_trait_selection/src/traits/select/mod.rs:1440
#97 rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::DepsType>::with_anon_task<rustc_middle::dep_graph::DepsType, rustc_middle::ty::context::TyCtxt, rustc_trait_selection::traits::select::{impl#2}::in_task::{closure_env#0}<rustc_trait_selection::traits::select::{impl#2}::evaluate_trait_predicate_recursively::{closure#0}::{closure_env#1}, core::result::Result<rustc_middle::traits::select::EvaluationResult, rustc_middle::traits::select::OverflowError>>, core::result::Result<rustc_middle::traits::select::EvaluationResult, rustc_middle::traits::select::OverflowError>> (self=0x7f5461f6a4f8, cx=..., dep_kind=..., op=...) at compiler/rustc_query_system/src/dep_graph/graph.rs:303
#98 0x00007f546b042687 in rustc_trait_selection::traits::select::SelectionContext::in_task<rustc_trait_selection::traits::select::{impl#2}::evaluate_trait_predicate_recursively::{closure#0}::{closure_env#1}, core::result::Result<rustc_middle::traits::select::EvaluationResult, rustc_middle::traits::select::OverflowError>> (self=0x7f5461f66208, op=...) at compiler/rustc_trait_selection/src/traits/select/mod.rs:1440
#99 rustc_trait_selection::traits::select::{impl#2}::evaluate_trait_predicate_recursively::{closure#0} () at compiler/rustc_trait_selection/src/traits/select/mod.rs:1091
#100 rustc_trait_selection::traits::select::SelectionContext::evaluate_trait_predicate_recursively (self=<optimized out>, previous_stack=..., obligation=...) at compiler/rustc_trait_selection/src/traits/select/mod.rs:1046
#101 0x00007f546b040468 in rustc_trait_selection::traits::select::{impl#2}::evaluate_predicate_recursively::{closure#0}::{closure#0} () at compiler/rustc_trait_selection/src/traits/select/mod.rs:686
#102 0x00007f546b03fd09 in stacker::maybe_grow<core::result::Result<rustc_middle::traits::select::EvaluationResult, rustc_middle::traits::select::OverflowError>, rustc_trait_selection::traits::select::{impl#2}::evaluate_predicate_recursively::{closure#0}::{closure_env#0}> (red_zone=102400, stack_size=1048576, callback=...) at /usr/local/google/home/dkoloski/.cargo/registry/src/index.crates.io-6f17d22bba15001f/stacker-0.1.15/src/lib.rs:55
#103 rustc_data_structures::stack::ensure_sufficient_stack<core::result::Result<rustc_middle::traits::select::EvaluationResult, rustc_middle::traits::select::OverflowError>, rustc_trait_selection::traits::select::{impl#2}::evaluate_predicate_recursively::{closure#0}::{closure_env#0}> (f=...) at compiler/rustc_data_structures/src/stack.rs:17
#104 rustc_trait_selection::traits::select::{impl#2}::evaluate_predicate_recursively::{closure#0} () at compiler/rustc_trait_selection/src/traits/select/mod.rs:679
#105 rustc_trait_selection::traits::select::SelectionContext::evaluate_predicate_recursively (self=<optimized out>, previous_stack=..., obligation=...) at compiler/rustc_trait_selection/src/traits/select/mod.rs:658
#106 0x00007f546b03f4bc in rustc_trait_selection::traits::select::SelectionContext::evaluate_predicates_recursively<alloc::vec::into_iter::IntoIter<rustc_infer::traits::Obligation<rustc_middle::ty::Predicate>, alloc::alloc::Global>> (self=0x7f5461f66208, stack=..., predicates=...) at compiler/rustc_trait_selection/src/traits/select/mod.rs:646
#107 0x00007f546ae35d6f in rustc_trait_selection::traits::select::{impl#2}::evaluate_candidate::{closure#0}::{closure#0} (this=0x7f5461f66208) at compiler/rustc_trait_selection/src/traits/select/mod.rs:1312
#108 rustc_trait_selection::traits::select::{impl#2}::evaluation_probe::{closure#0}<rustc_trait_selection::traits::select::{impl#2}::evaluate_candidate::{closure#0}::{closure_env#0}> (snapshot=0x7f5461eee2f0) at compiler/rustc_trait_selection/src/traits/select/mod.rs:612
#109 rustc_infer::infer::InferCtxt::probe<core::result::Result<rustc_middle::traits::select::EvaluationResult, rustc_middle::traits::select::OverflowError>, rustc_trait_selection::traits::select::{impl#2}::evaluation_probe::{closure_env#0}<rustc_trait_selection::traits::select::{impl#2}::evaluate_candidate::{closure#0}::{closure_env#0}>> (self=0x7f5461f662d8, f=...) at compiler/rustc_infer/src/infer/mod.rs:885
#110 0x00007f546b044753 in rustc_trait_selection::traits::select::SelectionContext::evaluation_probe<rustc_trait_selection::traits::select::{impl#2}::evaluate_candidate::{closure#0}::{closure_env#0}> (self=0x7f5461f66208, op=...) at compiler/rustc_trait_selection/src/traits/select/mod.rs:610

In our case, the total number of frames reached about 2500. It looks like this may be hitting some kind of exponential computation edge case.

Potentially related: #116914

Meta

rustc --version --verbose:

rustc 1.75.0-nightly (9d1e4b787 2023-10-11)
binary: rustc
commit-hash: 9d1e4b7870f0aecb9f53e71f3cca3529b21d677a
commit-date: 2023-10-11
host: x86_64-unknown-linux-gnu
release: 1.75.0-nightly
LLVM version: 17.0.2

This is a custom-built toolchain following our toolchain build instructions.

@djkoloski djkoloski added the C-bug Category: This is a bug. label Oct 20, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 20, 2023
@compiler-errors
Copy link
Member

Can you share the top and bottom 200 frames (or ideally all of them)? The ones you shared aren't really useful if trying to diagnose exactly how we got here.

@djkoloski
Copy link
Contributor Author

Here's a gist of all of the frames in the callstack I captured. I added it to the main issue description as well.

@saethlin saethlin added I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Oct 22, 2023
@erickt
Copy link
Contributor

erickt commented Oct 24, 2023

After letting the compiler run for 30+ minutes, we managed to get a rustc-ice during selection which comes with a very large backtrace.

@erickt
Copy link
Contributor

erickt commented Oct 25, 2023

I ran our build again, and it produced essentially the same ice, so good news it's repeatable. The downside is it was compiling for about 7 hours for the ICE to occur :)

@djkoloski
Copy link
Contributor Author

After some bisecting and code reduction, it looks like this may be related to #87012

@djkoloski djkoloski changed the title rustc never finishes after emitting partial error diagnostic rustc never finishes after emitting error diagnostic Nov 2, 2023
gnoliyil pushed a commit to gnoliyil/fuchsia that referenced this issue Jan 27, 2024
This is an attempt to reduce the amount of namespace globs to see if
it can help out with the trait expansion bug
rust-lang/rust#116996.

Change-Id: I58b6aa8484bab186980b76ccea1fcd34091b60ed
Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/941676
Fuchsia-Auto-Submit: Erick Tryzelaar <etryzelaar@google.com>
Commit-Queue: Auto-Submit <auto-submit@fuchsia-infra.iam.gserviceaccount.com>
Reviewed-by: Kevin Lindkvist <lindkvist@google.com>
@lqd
Copy link
Member

lqd commented Jun 4, 2024

Example starnix_core reduction:

use std::collections::{BTreeMap, HashMap, HashSet};
use std::sync::{Arc, RwLock, Weak};

pub struct Kernel {
    pub pids: RwLock<PidTable>,
}
pub struct PidTable {
    table: HashMap<u32, PidEntry>,
}
struct PidEntry {
    process_group: Option<Weak<ProcessGroup>>,
}

pub struct ProcessGroupMutableState {
    thread_groups: BTreeMap<u32, Weak<ThreadGroup>>,
}
pub struct ProcessGroup {
    mutable_state: RwLock<ProcessGroupMutableState>,
}
pub struct Task {
    fs: Option<Arc<FsContext>>,
}
pub struct ThreadGroup {
    pub kernel: Arc<Kernel>,
    mutable_state: RwLock<ThreadGroupMutableState>,
}
pub struct ThreadGroupMutableState {
    tasks: BTreeMap<u32, TaskContainer>,
}
pub struct TaskContainer(Arc<Task>);
pub struct FileSystem {
    pub kernel: Weak<Kernel>,
}
struct FsContextState {
    namespace: Arc<Namespace>,
}
pub struct FsContext {
    state: RwLock<FsContextState>,
}

pub struct Namespace {
    root_mount: Arc<Mount>,
}
pub struct Mount {
    fs: Arc<FileSystem>,
    state: RwLock<MountState>,
}
pub struct MountState {
    submounts: HashMap<Arc<ERROR>, Arc<Mount>>,
    peer_group_: Option<(Arc<PeerGroup>, Arc<Mount>)>, // +6s
    upstream_: Option<(Weak<PeerGroup>, Arc<Mount>)>, // +5s
}
struct PeerGroup {
    state: RwLock<PeerGroupState>,
}
struct PeerGroupState {
    downstream: HashSet<Arc<Mount>>,
}
pub trait SocketOps: Send + Sync {} // + Sync = +5s
struct UEventNetlinkSocket {
    kernel: Arc<Kernel>,
}
impl SocketOps for UEventNetlinkSocket {} // load-bearing

fn main() {}

The above is:

  • slow using the old solver, when there's the error
  • fast using the old solver, without the non-existent ERROR type
  • fast using the new solver, with or without the non-existent ERROR type

This behavior started in 1.73:

Version Item Self time % of total time Time Item count
1.70.0 evaluate_obligation 646.14µs 3.441 1.24ms 186
1.71.0 evaluate_obligation 699.79µs 3.830 1.37ms 186
1.72.0 evaluate_obligation 737.36µs 3.974 1.43ms 186
1.73.0 evaluate_obligation 5.88s 99.738 5.88s 177
1.74.0 evaluate_obligation 5.86s 99.738 5.86s 177
1.75.0 evaluate_obligation 5.63s 99.736 5.63s 176
1.76.0 evaluate_obligation 5.57s 99.739 5.57s 176
1.77.0 evaluate_obligation 5.59s 99.676 5.59s 167
1.78.0 evaluate_obligation 5.59s 99.673 5.59s 168

I'm pretty sure it appeared in nightly-2023-07-29 (commit range), where #113312 sticks out as particularly relevant (though it's a soundness fix), since the slowest evaluations are related to auto traits:

evaluate_obligation, value: TraitPredicate(<UEventNetlinkSocket as std::marker::Sync>, took 3299 ms
evaluate_obligation, value: TraitPredicate(<UEventNetlinkSocket as std::marker::Send>, took 3732 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-hang Issue: The compiler never terminates, due to infinite loops, deadlock, livelock, etc. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants