Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory corruption in cling #15511

Closed
1 task done
pikacic opened this issue May 14, 2024 · 17 comments · Fixed by #15854
Closed
1 task done

Possible memory corruption in cling #15511

pikacic opened this issue May 14, 2024 · 17 comments · Fixed by #15854
Assignees
Labels

Comments

@pikacic
Copy link

pikacic commented May 14, 2024

Check duplicate issues.

  • Checked for duplicates

Description

Since the switch to ROOT 6.30/02 (LCG 105) we started to experience segfaults related to dictionaries. The most straightforward reproducer is just a #include of a specific header

What makes me think that there may be a memory corruption is that I tried to isolate which part of that header was triggering the segfault and I noticed that (on a subset of the header) I could make the segfault appear and disappear just shuffling some class definitions.

I also find weird that the segfault seems to be related to an atexit function in libCling.so:

===========================================================
#10 0x00007f05fa744a7e in ?? ()
#11 0x00007ffd45bee240 in ?? ()
#12 0x00007f060b10d028 in ?? ()
#13 0x00007ffd45bee270 in ?? ()
#14 0x00007f05fa745920 in ?? ()
#15 0x00007f060b10c1a0 in ?? ()
#16 0x00007ffd45bee260 in ?? ()
#17 0x00007ffd45bee2c0 in ?? ()
#18 0x00007f05fa745b0d in ?? ()
#19 0x000000000204fc10 in ?? ()
#20 0x00000000133b2cc8 in ?? ()
#21 0x00000000133b2cc0 in ?? ()
#22 0x00007f060309016c in (anonymous namespace)::local_cxa_atexit(void (*)(void*), void*, cling::Interpreter*) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#23 0x00007ffd45bee260 in ?? ()
#24 0x00007f060b10d110 in ?? ()
#25 0x00007f060b10d020 in ?? ()
#26 0x00007f0607fe2e7a in ?? () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#27 0x00007f05fa741095 in ?? ()
#28 0x00007f05fa740f20 in ?? ()
#29 0x00007f060471d882 in (anonymous namespace)::GenericLLVMIRPlatformSupport::initialize(llvm::orc::JITDylib&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#30 0x00007f06031155f3 in cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#31 0x00007f0603092698 in cling::Interpreter::executeTransaction(cling::Transaction&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#32 0x00007f0603125b4a in cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling::Transaction*, 2u, llvm::PointerLikeTypeTraits<cling::Transaction*> > >&, bool) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#33 0x00007f0603128d98 in cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#34 0x00007f06030933dc in cling::Interpreter::DeclareInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions const&, cling::Transaction**) const () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#35 0x00007f0603095986 in cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#36 0x00007f06031781a7 in cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#37 0x00007f0602e677f7 in HandleInterpreterException (metaProcessor=0x308b020, input_line=0x4194ba0 "#line 1 "ROOT_prompt_0"n#include <LoKi/ParticleCuts.h>", compRes=
0x7ffd45beeafc: cling::Interpreter::kSuccess, result=0x7ffd45beeb00) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/metacling/src/TCling.cxx:2436
===========================================================

Reproducer

With the test_env.sh included in test_env.zip on lxplus.cern.ch:

❯ hx test_env.sh
-bash: hx: command not found
marcocle in 🌐 lxplus913 in ~/tmp/root-issue
❯ vim test_env.sh
marcocle in 🌐 lxplus913 in ~/tmp/root-issue took 2m35s
❯ bash
marcocle in 🌐 lxplus913 in ~/tmp/root-issue
❯ . test_env.sh
marcocle in 🌐 lxplus913 in ~/tmp/root-issue
❯ root
   ------------------------------------------------------------------
  | Welcome to ROOT 6.30/04                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Feb 03 2024, 17:20:15                 |
  | From heads/master@tags/v6-30-04                                  |
  | With g++ (GCC) 13.1.0                                            |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

root [0] #include <LoKi/ParticleCuts.h>

 *** Break *** segmentation violation



===========================================================
There was a crash (kSigSegmentationViolation).
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f060a0d89fa in wait4 () from /lib64/libc.so.6
#1  0x00007f060a04b243 in do_system () from /lib64/libc.so.6
#2  0x00007f060ac59eb2 in TUnixSystem::Exec (this=0x1fbd500, shellcmd=0x9457020 "/cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/etc/gdb-backtrace.sh 212891 1>&2") at /build/jenkins/workspace/lcg_release_pip
#3  0x00007f060ac5a753 in TUnixSystem::StackTrace (this=0x1fbd500) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:2411
#4  0x00007f060ac5e16c in TUnixSystem::DispatchSignals (this=0x1fbd500, sig=kSigSegmentationViolation) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:3631
#5  0x00007f060ac560e0 in SigHandler (sig=kSigSegmentationViolation) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:402
#6  0x00007f060ac5e06f in sighandler (sig=11) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:3602
#7  0x00007f060ac47a32 in textinput::TerminalConfigUnix::HandleSignal (this=0x7f060af75d80 <textinput::TerminalConfigUnix::Get()::s>, signum=11) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.
#8  0x00007f060ac47736 in (anonymous namespace)::TerminalConfigUnix__handleSignal (signum=11) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/textinput/src/textinput/TerminalConfigUnix.
#9  <signal handler called>
#10 0x00007f05fa744a7e in ?? ()
#11 0x00007ffd45bee240 in ?? ()
#12 0x00007f060b10d028 in ?? ()
#13 0x00007ffd45bee270 in ?? ()
#14 0x00007f05fa745920 in ?? ()
#15 0x00007f060b10c1a0 in ?? ()
#16 0x00007ffd45bee260 in ?? ()
#17 0x00007ffd45bee2c0 in ?? ()
#18 0x00007f05fa745b0d in ?? ()
#19 0x000000000204fc10 in ?? ()
#20 0x00000000133b2cc8 in ?? ()
#21 0x00000000133b2cc0 in ?? ()
#22 0x00007f060309016c in (anonymous namespace)::local_cxa_atexit(void (*)(void*), void*, cling::Interpreter*) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#23 0x00007ffd45bee260 in ?? ()
#24 0x00007f060b10d110 in ?? ()
#25 0x00007f060b10d020 in ?? ()
#26 0x00007f0607fe2e7a in ?? () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#27 0x00007f05fa741095 in ?? ()
#28 0x00007f05fa740f20 in ?? ()
#29 0x00007f060471d882 in (anonymous namespace)::GenericLLVMIRPlatformSupport::initialize(llvm::orc::JITDylib&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#30 0x00007f06031155f3 in cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#31 0x00007f0603092698 in cling::Interpreter::executeTransaction(cling::Transaction&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#32 0x00007f0603125b4a in cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling
#33 0x00007f0603128d98 in cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#34 0x00007f06030933dc in cling::Interpreter::DeclareInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions const&, cling::Transaction**) const () from /cvmfs/lhcb.cern
#35 0x00007f0603095986 in cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6
#36 0x00007f06031781a7 in cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#37 0x00007f0602e677f7 in HandleInterpreterException (metaProcessor=0x308b020, input_line=0x4194ba0 "#line 1 \"ROOT_prompt_0\"\n#include <LoKi/ParticleCuts.h>", compRes=
0x7ffd45beeafc: cling::Interpreter::kSuccess, result=0x7ffd45beeb00) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/metacling/src/TCling.cxx:2436
#38 0x00007f0602e683c4 in TCling::ProcessLine (this=0x20461a0, line=0x412ae70 "#line 1 \"ROOT_prompt_0\"\n#include <LoKi/ParticleCuts.h>", error=0x7ffd45beeedc) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.0
#39 0x00007f060aab78bf in TApplication::ProcessLine (this=0x200fe60, line=0x412ae70 "#line 1 \"ROOT_prompt_0\"\n#include <LoKi/ParticleCuts.h>", sync=false, err=0x7ffd45beeedc) at /build/jenkins/workspace/lcg_release_pipeline/build/proj
#40 0x00007f060b14a763 in TRint::ProcessLineNr (this=0x200fe60, filestem=0x7f060b15c757 "ROOT_prompt_", line=0x419af40 "#include <LoKi/ParticleCuts.h>", error=0x7ffd45beeedc) at /build/jenkins/workspace/lcg_release_pipeline/build/projec
#41 0x00007f060b149fa1 in TRint::HandleTermInput (this=0x200fe60) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/rint/src/TRint.cxx:648
#42 0x00007f060b1477cd in TTermInputHandler::Notify (this=0x413b570) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/rint/src/TRint.cxx:133
#43 0x00007f060b14c187 in TTermInputHandler::ReadNotify (this=0x413b570) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/rint/src/TRint.cxx:125
#44 0x00007f060ac58367 in TUnixSystem::CheckDescriptors (this=0x1fbd500) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:1322
#45 0x00007f060ac577bc in TUnixSystem::DispatchOneEvent (this=0x1fbd500, pendingOnly=false) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:1077
#46 0x00007f060ab4290f in TSystem::InnerLoop (this=0x1fbd500) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/base/src/TSystem.cxx:390
#47 0x00007f060ab426a4 in TSystem::Run (this=0x1fbd500) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/base/src/TSystem.cxx:340
#48 0x00007f060aab8367 in TApplication::Run (this=0x200fe60, retrn=false) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/base/src/TApplication.cxx:1890
#49 0x00007f060b1492e2 in TRint::Run (this=0x200fe60, retrn=false) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/rint/src/TRint.cxx:501
#50 0x0000000000401447 in main (argc=1, argv=0x7ffd45bf1438) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/main/src/rmain.cxx:84
===========================================================


The lines below might hint at the cause of the crash. If you see question
marks as part of the stack trace, try to recompile with debugging information
enabled and export CLING_DEBUG=1 environment variable before running.
You may get help by asking at the ROOT forum https://root.cern/forum
preferably using the command (.forum bug) in the ROOT prompt.
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs or (preferably) using the command (.gh bug) in
the ROOT prompt. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#10 0x00007f05fa744a7e in ?? ()
#11 0x00007ffd45bee240 in ?? ()
#12 0x00007f060b10d028 in ?? ()
#13 0x00007ffd45bee270 in ?? ()
#14 0x00007f05fa745920 in ?? ()
#15 0x00007f060b10c1a0 in ?? ()
#16 0x00007ffd45bee260 in ?? ()
#17 0x00007ffd45bee2c0 in ?? ()
#18 0x00007f05fa745b0d in ?? ()
#19 0x000000000204fc10 in ?? ()
#20 0x00000000133b2cc8 in ?? ()
#21 0x00000000133b2cc0 in ?? ()
#22 0x00007f060309016c in (anonymous namespace)::local_cxa_atexit(void (*)(void*), void*, cling::Interpreter*) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#23 0x00007ffd45bee260 in ?? ()
#24 0x00007f060b10d110 in ?? ()
#25 0x00007f060b10d020 in ?? ()
#26 0x00007f0607fe2e7a in ?? () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#27 0x00007f05fa741095 in ?? ()
#28 0x00007f05fa740f20 in ?? ()
#29 0x00007f060471d882 in (anonymous namespace)::GenericLLVMIRPlatformSupport::initialize(llvm::orc::JITDylib&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#30 0x00007f06031155f3 in cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#31 0x00007f0603092698 in cling::Interpreter::executeTransaction(cling::Transaction&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#32 0x00007f0603125b4a in cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling::Transaction*, 2u, llvm::PointerLikeTypeTraits<cling::Transaction*> > >&, bool) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#33 0x00007f0603128d98 in cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#34 0x00007f06030933dc in cling::Interpreter::DeclareInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions const&, cling::Transaction**) const () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#35 0x00007f0603095986 in cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#36 0x00007f06031781a7 in cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#37 0x00007f0602e677f7 in HandleInterpreterException (metaProcessor=0x308b020, input_line=0x4194ba0 "#line 1 "ROOT_prompt_0"n#include <LoKi/ParticleCuts.h>", compRes=
0x7ffd45beeafc: cling::Interpreter::kSuccess, result=0x7ffd45beeb00) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/metacling/src/TCling.cxx:2436
===========================================================


Root > .q

ROOT version

❯ which root
/cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/bin/root
   ------------------------------------------------------------------
  | Welcome to ROOT 6.30/04                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Feb 03 2024, 17:20:15                 |
  | From heads/master@tags/v6-30-04                                  |
  | With g++ (GCC) 13.1.0                                            |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

Installation method

LCG builds

Operating system

Linux (EL9)

Additional context

No response

@pikacic pikacic added the bug label May 14, 2024
@hahnjo hahnjo added the experiment Affects an experiment / reported by its software & computimng experts label May 15, 2024
@devajithvs
Copy link
Contributor

A stripped down version of the header that segfaults:

// test.h
#include "LoKi/Particles.h"

using EQUALTO = LoKi::EqualToValue<const LHCb::Particle*>;
const auto TRTYPE = LoKi::Particles::TrackType{};

// Remove ANY of the lines below and the segfault disappears.
const auto ISDOWN = EQUALTO{TRTYPE, LHCb::Track::Types::Downstream};
const auto ISLONG = EQUALTO{TRTYPE, LHCb::Track::Types::Long};
const auto MUONBDT_CATBOOST = LoKi::Particles::MuonMVA2{};
const auto ISMUONPID = LoKi::Particles::IsMuon{};
const auto ISMUONLOOSE = LoKi::Particles::IsMuonLoose{};
const auto ISMUONTIGHT = LoKi::Particles::IsMuonTight{};
const auto ISUP = EQUALTO{TRTYPE, LHCb::Track::Types::Upstream};
const auto KEY = LoKi::Particles::Key{};
const auto M = LoKi::Particles::Mass{};
const auto LV01 = LoKi::Particles::DecayAngle{1};
const auto LV02 = LoKi::Particles::DecayAngle{2};
const auto LV03 = LoKi::Particles::DecayAngle{3};
const auto LV04 = LoKi::Particles::DecayAngle{4};
const auto M0 = LoKi::Particles::Mass{};
const auto M1 = LoKi::Particles::InvariantMass{1};
const auto M12 = LoKi::Particles::InvariantMass{1, 2};
const auto M13 = LoKi::Particles::InvariantMass{1, 3};
const auto M14 = LoKi::Particles::InvariantMass{1, 4};
const auto M2 = LoKi::Particles::InvariantMass{2};
const auto M23 = LoKi::Particles::InvariantMass{2, 3};
const auto M24 = LoKi::Particles::InvariantMass{2, 4};
const auto M34 = LoKi::Particles::InvariantMass{3, 4};
const auto MM = LoKi::Particles::MeasuredMass{};

@ruidexu
Copy link

ruidexu commented Jun 5, 2024

I met a similar situation here. I am using LCG_103.

*** Break *** segmentation violation



===========================================================
There was a crash.
This is the entire stack trace of all threads:
===========================================================
#0  0x00007f62c74d89fa in wait4 () from /lib64/libc.so.6
#1  0x00007f62c744b243 in do_system () from /lib64/libc.so.6
#2  0x00007f62c570fb69 in TUnixSystem::StackTrace() () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-opt/lib/libCore.so
#3  0x00007f62c5edf463 in (anonymous namespace)::TExceptionHandlerImp::HandleException(int) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-opt/lib/libcppyy_backend3_9.so
#4  0x00007f62c570f391 in TUnixSystem::DispatchSignals(ESignals) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-opt/lib/libCore.so
#5  <signal handler called>
#6  0x00007f62be7d56fe in ?? ()
#7  0x00007ffce112d920 in ?? ()
#8  0x00007f62be7dc429 in ?? ()
#9  0x00007ffce112d950 in ?? ()
#10 0x00007f62be7d26b0 in ?? ()
#11 0x00007f62bacd3180 in ?? ()
#12 0x00007ffce112d940 in ?? ()
#13 0x00007ffce112d9a0 in ?? ()
#14 0x00007f62be7d985d in ?? ()
#15 0x000000000214fc80 in ?? ()
#16 0x00007f62be7d26b0 in ?? ()
#17 0x000000001d59e690 in ?? ()
#18 0x00007f62bf3940ec in (anonymous namespace)::local_cxa_atexit(void (*)(void*), void*, cling::Interpreter*) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-opt/lib/libCling.so
#19 0x00007ffce112d940 in ?? ()
#20 0x00007f62bacd5778 in ?? ()
#21 0x00007f62bacd5740 in ?? ()
#22 0x0000000016991e30 in ?? ()
#23 0x00007f62be7d298d in ?? ()
#24 0x0000000000000000 in ?? ()
===========================================================


The lines below might hint at the cause of the crash. If you see question
marks as part of the stack trace, try to recompile with debugging information
enabled and export CLING_DEBUG=1 environment variable before running.
You may get help by asking at the ROOT forum https://root.cern/forum
preferably using the command (.forum bug) in the ROOT prompt.
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs or (preferably) using the command (.gh bug) in
the ROOT prompt. Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
#6  0x00007f62be7d56fe in ?? ()
#7  0x00007ffce112d920 in ?? ()
#8  0x00007f62be7dc429 in ?? ()
#9  0x00007ffce112d950 in ?? ()
#10 0x00007f62be7d26b0 in ?? ()
#11 0x00007f62bacd3180 in ?? ()
#12 0x00007ffce112d940 in ?? ()
#13 0x00007ffce112d9a0 in ?? ()
#14 0x00007f62be7d985d in ?? ()
#15 0x000000000214fc80 in ?? ()
#16 0x00007f62be7d26b0 in ?? ()
#17 0x000000001d59e690 in ?? ()
#18 0x00007f62bf3940ec in (anonymous namespace)::local_cxa_atexit(void (*)(void*), void*, cling::Interpreter*) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-opt/lib/libCling.so
#19 0x00007ffce112d940 in ?? ()
#20 0x00007f62bacd5778 in ?? ()
#21 0x00007f62bacd5740 in ?? ()
#22 0x0000000016991e30 in ?? ()
#23 0x00007f62be7d298d in ?? ()
#24 0x0000000000000000 in ?? ()
===========================================================

@vgvassilev
Copy link
Member

Can you run valgrind using the root suppression file?

@ruidexu
Copy link

ruidexu commented Jun 10, 2024

Can you run valgrind using the root suppression file?

Hi, here is the result. Do note that I do not know valgrind very well.

I ran this:

valgrind -v --leak-check=full --show-leak-kinds=all --suppressions=/afs/cern.
ch/work/r/ruide/valgrind-root.supp ls -l ./run gaudirun.py /afs/cern.ch/work/r/ruide/private/starterkit/ntuple_o
ptions.py

The result looks like this:

==930722== HEAP SUMMARY:
==930722==     in use at exit: 20,459 bytes in 11 blocks
==930722==   total heap usage: 754 allocs, 743 frees, 93,819 bytes allocated
==930722== 
==930722== Searching for pointers to 11 not-freed blocks
==930722== Checked 132,624 bytes
==930722== 
==930722== 24 bytes in 1 blocks are still reachable in loss record 1 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x1164DF: ??? (in /usr/bin/ls)
==930722==    by 0x11657C: ??? (in /usr/bin/ls)
==930722==    by 0x1178C7: ??? (in /usr/bin/ls)
==930722==    by 0x10D35A: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== 24 bytes in 1 blocks are still reachable in loss record 2 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x11661F: ??? (in /usr/bin/ls)
==930722==    by 0x117A3E: ??? (in /usr/bin/ls)
==930722==    by 0x10D35A: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== 48 bytes in 1 blocks are still reachable in loss record 3 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x115ECB: ??? (in /usr/bin/ls)
==930722==    by 0x10E074: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== 54 bytes in 2 blocks are still reachable in loss record 4 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x494C12E: strdup (in /usr/lib64/libc.so.6)
==930722==    by 0x48929FF: selinux_raw_to_trans_context (in /usr/lib64/libselinux.so.1)
==930722==    by 0x4892ADB: lgetfilecon (in /usr/lib64/libselinux.so.1)
==930722==    by 0x117464: ??? (in /usr/bin/ls)
==930722==    by 0x10D35A: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== 56 bytes in 1 blocks are still reachable in loss record 5 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x115DC8: ??? (in /usr/bin/ls)
==930722==    by 0x115DF9: ??? (in /usr/bin/ls)
==930722==    by 0x10E549: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== 56 bytes in 1 blocks are still reachable in loss record 6 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x115DC8: ??? (in /usr/bin/ls)
==930722==    by 0x115DF9: ??? (in /usr/bin/ls)
==930722==    by 0x10D1A2: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== 69 bytes in 2 blocks are still reachable in loss record 7 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x11666E: ??? (in /usr/bin/ls)
==930722==    by 0x116CAF: ??? (in /usr/bin/ls)
==930722==    by 0x10D35A: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== 128 bytes in 1 blocks are still reachable in loss record 8 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x113989: ??? (in /usr/bin/ls)
==930722==    by 0x10D2A6: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== 20,000 bytes in 1 blocks are still reachable in loss record 9 of 9
==930722==    at 0x484480F: malloc (vg_replace_malloc.c:442)
==930722==    by 0x115DC8: ??? (in /usr/bin/ls)
==930722==    by 0x10D310: ??? (in /usr/bin/ls)
==930722==    by 0x48D958F: (below main) (in /usr/lib64/libc.so.6)
==930722== 
==930722== LEAK SUMMARY:
==930722==    definitely lost: 0 bytes in 0 blocks
==930722==    indirectly lost: 0 bytes in 0 blocks
==930722==      possibly lost: 0 bytes in 0 blocks
==930722==    still reachable: 20,459 bytes in 11 blocks
==930722==         suppressed: 0 bytes in 0 blocks
==930722== 
==930722== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@vgvassilev
Copy link
Member

I am surprised that valgrind is happy…

@pikacic
Copy link
Author

pikacic commented Jun 10, 2024

There's problem in the way the application was invoked: there a stray ls -l on the command line that make valgrind check ls and not gaudirun.py.

@pikacic
Copy link
Author

pikacic commented Jun 10, 2024

I tried to run Valgrind, but it only spots one small leak, despite the segfault.

But that made me look a bit better at the stack trace and I noticed the line

#37 0x00007f0602e677f7 in HandleInterpreterException (metaProcessor=0x308b020, input_line=0x4194ba0 "#line 1 "ROOT_prompt_0"n#include <LoKi/ParticleCuts.h>", compRes=
0x7ffd45beeafc: cling::Interpreter::kSuccess, result=0x7ffd45beeb00) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/metacling/src/TCling.cxx:2436

I also tried to put the line #include <LoKi/ParticleCuts.h> into a small file test.C and invoke root test.C... no segfault, but an error from cling that complains about redefinition of symbols.

Tomorrow I'll investigate this new path, as it might be that the segfault is a red herring (hiding the actual problem in my code).

@devajithvs
Copy link
Contributor

devajithvs commented Jun 11, 2024

The file /cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1519 doesn't exist/was removed. I tried to reproduce the error with 1529 by loading the stripped down header file (root test.h)instead and I now get a different error message.

Processing temp.h...                                                                                                                                                                                                                                                                      
In file included from input_line_8:1:                                                                                                                                                                                                                                                     
In file included from /afs/cern.ch/user/d/dvalapar/temp.h:2:                                                                                                                                                                                                                              
In file included from /cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Particles.h:20:                                                                                                                                       
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h:35:21: error: redefinition of 'CLID_ProtoParticle'                                                                                                         
  static const CLID CLID_ProtoParticle = 803;                                                                                                                                                                                                                                             
                    ^                                                                                                                                                                                                                                                                     
input_line_10:1:10: note: '/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h' included multiple times, additional include site here                                                                          
#include "/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h"                                                                                                                                                 
         ^                                                                                                                                                                                                                                                                                
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Particles.h:20:10: note: '/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h' included multiple 
times, additional include site here                                                                                                                                                                                                                                                       
#include "Event/ProtoParticle.h"                                                                                                                                                                                                                                                          
         ^                                                                                                                                                                                                                                                                                
...
...SKIPPED LINES
...
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h:55:9: error: redefinition of 'ProtoParticle'                                                                                                               
  class ProtoParticle final : public KeyedObject<int> {                                                                                                                                                                                                                                   
        ^                                                                                                                                                                                                                                                                                 
input_line_10:1:10: note: '/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h' included multiple times, additional include site here                                                                          
#include "/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h"                                                                                                                                                 
         ^                                                                                                                                                                                                                                                                                
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Particles.h:20:10: note: '/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h' included multiple 
times, additional include site here                                                                                                                                                                                                                                                       
#include "Event/ProtoParticle.h"                                                                                                                                                                                                                                                          
         ^                                                                                                                                                                                                                                                                                
In file included from input_line_8:1:                                                                                                                                                                                                                                                     
In file included from /afs/cern.ch/user/d/dvalapar/temp.h:2:                                                                                                                                                                                                                              
In file included from /cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Particles.h:20:                                                                                                                                       
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h:326:24: error: redefinition of 'operator<<'                                                                                                                
  inline std::ostream& operator<<( std::ostream& s, LHCb::ProtoParticle::additionalInfo e ) {                                                                                                                                                                                             
                       ^                                                                                                                                                                                                                                                                  
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h:326:24: note: previous definition is here                                                                                                                  
  inline std::ostream& operator<<( std::ostream& s, LHCb::ProtoParticle::additionalInfo e ) {                                                                                                                                                                                             
                       ^                                                                                                                                                                                                                                                                  
root.exe: /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/metacling/src/TCling.cxx:2200: virtual void TCling::RegisterModule(const char*, const char**, const char**, const char*, const char*, void (*)(), const TInterpreter::FwdDeclArg
sToKeepCollection_t&, const char**, Bool_t, Bool_t): Assertion `cling::Interpreter::kSuccess == compRes && "The forward declarations could not be compiled"' failed.  

The error seems weird because I see #pragma once in ProtoParticle.h

@ruidexu
Copy link

ruidexu commented Jun 11, 2024

Yes I noticed that as well. I switched to 1533 and got the same seg fault.

The file /cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1519 doesn't exist/was removed. I tried to reproduce the error with 1529 by loading the stripped down header file (root test.h)instead and I now get a different error message.

Processing temp.h...                                                                                                                                                                                                                                                                      
In file included from input_line_8:1:                                                                                                                                                                                                                                                     
In file included from /afs/cern.ch/user/d/dvalapar/temp.h:2:                                                                                                                                                                                                                              
In file included from /cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Particles.h:20:                                                                                                                                       
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h:35:21: error: redefinition of 'CLID_ProtoParticle'                                                                                                         
  static const CLID CLID_ProtoParticle = 803;                                                                                                                                                                                                                                             
                    ^                                                                                                                                                                                                                                                                     
input_line_10:1:10: note: '/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h' included multiple times, additional include site here                                                                          
#include "/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h"                                                                                                                                                 
         ^                                                                                                                                                                                                                                                                                
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Particles.h:20:10: note: '/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h' included multiple 
times, additional include site here                                                                                                                                                                                                                                                       
#include "Event/ProtoParticle.h"                                                                                                                                                                                                                                                          
         ^                                                                                                                                                                                                                                                                                
...
...SKIPPED LINES
...
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h:55:9: error: redefinition of 'ProtoParticle'                                                                                                               
  class ProtoParticle final : public KeyedObject<int> {                                                                                                                                                                                                                                   
        ^                                                                                                                                                                                                                                                                                 
input_line_10:1:10: note: '/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h' included multiple times, additional include site here                                                                          
#include "/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h"                                                                                                                                                 
         ^                                                                                                                                                                                                                                                                                
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Particles.h:20:10: note: '/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h' included multiple 
times, additional include site here                                                                                                                                                                                                                                                       
#include "Event/ProtoParticle.h"                                                                                                                                                                                                                                                          
         ^                                                                                                                                                                                                                                                                                
In file included from input_line_8:1:                                                                                                                                                                                                                                                     
In file included from /afs/cern.ch/user/d/dvalapar/temp.h:2:                                                                                                                                                                                                                              
In file included from /cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Particles.h:20:                                                                                                                                       
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h:326:24: error: redefinition of 'operator<<'                                                                                                                
  inline std::ostream& operator<<( std::ostream& s, LHCb::ProtoParticle::additionalInfo e ) {                                                                                                                                                                                             
                       ^                                                                                                                                                                                                                                                                  
/cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/Event/ProtoParticle.h:326:24: note: previous definition is here                                                                                                                  
  inline std::ostream& operator<<( std::ostream& s, LHCb::ProtoParticle::additionalInfo e ) {                                                                                                                                                                                             
                       ^                                                                                                                                                                                                                                                                  
root.exe: /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/metacling/src/TCling.cxx:2200: virtual void TCling::RegisterModule(const char*, const char**, const char**, const char*, const char*, void (*)(), const TInterpreter::FwdDeclArg
sToKeepCollection_t&, const char**, Bool_t, Bool_t): Assertion `cling::Interpreter::kSuccess == compRes && "The forward declarations could not be compiled"' failed.  

The error seems weird because I see #pragma once in ProtoParticle.h

Thank you for the note. I will try it again later!

There's problem in the way the application was invoked: there a stray ls -l on the command line that make valgrind check ls and not gaudirun.py.

@pikacic
Copy link
Author

pikacic commented Jun 12, 2024

I have a small update, but no good news.

When trying to reproduce the segfault with a root test.C I get stuck in problems that seem related to bad handling of #pragma once and include guards. If I solve the include guards problems then I still get the segfault both via the interactive #include <LoKi/ParticleCuts.h> and root test.C.

I prepared small "reproducer" that should work on any RHEL9 equivalent machine with CVMFS and the HEP_OSlibs meta-rpm.
See attached root-15511.tar.gz

@hahnjo
Copy link
Member

hahnjo commented Jun 13, 2024

Output of valgrind with the original report of just #include <LoKi/ParticleCuts.h>, replacing 1519 in the paths with 1529:

$ VALGRIND_LIB=/cvmfs/lhcb.cern.ch/lib/lcg/releases/valgrind/3.22.0-113bc/x86_64-el9-gcc13-dbg/libexec/valgrind/ valgrind --leak-check=full --suppressions=/cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/etc/valgrind-root.supp root.exe -q -e "#include <LoKi/ParticleCuts.h>"
==652727== Memcheck, a memory error detector
==652727== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==652727== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==652727== Command: root.exe -q -e #include\ \<LoKi/ParticleCuts.h\>
==652727== 
   ------------------------------------------------------------------
  | Welcome to ROOT 6.30/04                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Feb 03 2024, 17:20:15                 |
  | From heads/master@tags/v6-30-04                                  |
  | With g++ (GCC) 13.1.0                                            |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------


==652727== Conditional jump or move depends on uninitialised value(s)
==652727==    at 0xB2BAFE3: llvm::ConstantExpr::getGetElementPtr(llvm::Type*, llvm::Constant*, llvm::ArrayRef<llvm::Value*>, bool, llvm::Optional<unsigned int>, llvm::Type*) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0xA9E4679: llvm::Evaluator::EvaluateBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, false>, llvm::BasicBlock*&, bool&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0xA9E5D5C: llvm::Evaluator::EvaluateFunction(llvm::Function*, llvm::Constant*&, llvm::SmallVectorImpl<llvm::Constant*> const&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0xA9E4F46: llvm::Evaluator::EvaluateBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, false>, llvm::BasicBlock*&, bool&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0xA9E5D5C: llvm::Evaluator::EvaluateFunction(llvm::Function*, llvm::Constant*&, llvm::SmallVectorImpl<llvm::Constant*> const&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x94CD322: EvaluateStaticConstructor(llvm::Function*, llvm::DataLayout const&, llvm::TargetLibraryInfo*) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0xA9D2A88: llvm::optimizeGlobalCtorsList(llvm::Module&, llvm::function_ref<bool (llvm::Function*)>) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x94D4887: (anonymous namespace)::GlobalOptLegacyPass::runOnModule(llvm::Module&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0xB392769: llvm::legacy::PassManagerImpl::run(llvm::Module&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x77E6523: cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x7763697: cling::Interpreter::executeTransaction(cling::Transaction&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x77F6B49: cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling::Transaction*, 2u, llvm::PointerLikeTypeTraits<cling::Transaction*> > >&, bool) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727== 
==652727== Invalid read of size 8
==652727==    at 0x40703A7E: ???
==652727==    by 0x40704B0C: ???
==652727==    by 0x40700094: ???
==652727==    by 0x406FFF1F: ???
==652727==    by 0x8DEE881: (anonymous namespace)::GenericLLVMIRPlatformSupport::initialize(llvm::orc::JITDylib&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x77E65F2: cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x7763697: cling::Interpreter::executeTransaction(cling::Transaction&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x77F6B49: cling::IncrementalParser::commitTransaction(llvm::PointerIntPair<cling::Transaction*, 2u, cling::IncrementalParser::EParseResult, llvm::PointerLikeTypeTraits<cling::Transaction*>, llvm::PointerIntPairInfo<cling::Transaction*, 2u, llvm::PointerLikeTypeTraits<cling::Transaction*> > >&, bool) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x77F9D97: cling::IncrementalParser::Compile(llvm::StringRef, cling::CompilationOptions const&) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x77643DB: cling::Interpreter::DeclareInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions const&, cling::Transaction**) const (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x7766985: cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==    by 0x78491A6: cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) (in /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so)
==652727==  Address 0xffffffffffffffe8 is not stack'd, malloc'd or (recently) free'd
==652727== 

 *** Break *** segmentation violation

(note that I ran it directly on root.exe, otherwise valgrind will only see the "wrapper" root executable that forks into root.exe)

With CLING_DEBUG=1, we can at least get a proper stack trace of where it's crashing:

$ CLING_DEBUG=1 root.exe 
   ------------------------------------------------------------------
  | Welcome to ROOT 6.30/04                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Feb 03 2024, 17:20:15                 |
  | From heads/master@tags/v6-30-04                                  |
  | With g++ (GCC) 13.1.0                                            |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

root [0] #include <LoKi/ParticleCuts.h>

 *** Break *** segmentation violation



===========================================================
There was a crash (kSigSegmentationViolation).
This is the entire stack trace of all threads:
===========================================================
#0  0x00007ff1c12d89fa in wait4 () from /lib64/libc.so.6
#1  0x00007ff1c124b243 in do_system () from /lib64/libc.so.6
#2  0x00007ff1c1e59eb2 in TUnixSystem::Exec (this=0xf5b500, shellcmd=0x880a600 "/cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/etc/gdb-backtrace.sh 576550 1>&2") at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:2120
#3  0x00007ff1c1e5a753 in TUnixSystem::StackTrace (this=0xf5b500) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:2411
#4  0x00007ff1c1e5e16c in TUnixSystem::DispatchSignals (this=0xf5b500, sig=kSigSegmentationViolation) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:3631
#5  0x00007ff1c1e560e0 in SigHandler (sig=kSigSegmentationViolation) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:402
#6  0x00007ff1c1e5e06f in sighandler (sig=11) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/unix/src/TUnixSystem.cxx:3602
#7  0x00007ff1c1e47a32 in textinput::TerminalConfigUnix::HandleSignal (this=0x7ff1c2175d80 <textinput::TerminalConfigUnix::Get()::s>, signum=11) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/textinput/src/textinput/TerminalConfigUnix.cpp:99
#8  0x00007ff1c1e47736 in (anonymous namespace)::TerminalConfigUnix__handleSignal (signum=11) at /build/jenkins/workspace/lcg_release_pipeline/build/projects/ROOT-6.30.04/src/ROOT/6.30.04/core/textinput/src/textinput/TerminalConfigUnix.cpp:36
#9  <signal handler called>
#10 0x00007ff1b033fa7e in LoKi::FunctorFromFunctor<LHCb::Particle const*, double>::FunctorFromFunctor (this=0x7ffcf3a95340, right=...) at /cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/LHCb/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/Functor.h:114
#11 0x00007ff1b0340b0d in __cxx_global_var_initcling_module_10_.161(void) () at /cvmfs/lhcbdev.cern.ch/nightlies/lhcb-run2-patches/1529/Phys/InstallArea/x86_64_v2-el9-gcc13-dbg/include/LoKi/ParticleCuts.h:2880
#12 0x00007ff1b033c095 in __orc_init_func.cling-module-10 ()
#13 0x00007ff1bb91d882 in (anonymous namespace)::GenericLLVMIRPlatformSupport::initialize(llvm::orc::JITDylib&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so
#14 0x00007ff1ba3155f3 in cling::IncrementalExecutor::runStaticInitializersOnce(cling::Transaction&) () from /cvmfs/lhcb.cern.ch/lib/lcg/releases/ROOT/6.30.04-dd2db/x86_64-el9-gcc13-dbg/lib/libCling.so

@pikacic if LoKi::FunctorFromFunctor<LHCb::Particle const*, double>::FunctorFromFunctor rings a bell for you, please shout. That's what I will look into next...

@pikacic
Copy link
Author

pikacic commented Jun 14, 2024

Let me have a look.

@hahnjo
Copy link
Member

hahnjo commented Jun 14, 2024

Okay, never mind, this is a Cling issue: If there are more than 16 const variables with non-trivial constructors, their execution order may be scrambled:

extern "C" int printf(const char*, ...);

struct A {
  int val;
  A(int v) : val(v) {
    printf("A(%d), this = %p\n", val, this);
  }
  ~A() {
    printf("~A(%d), this = %p\n", val, this);
  }
};

const A a1(1);
const A a2(2);
const A a3(3);
const A a4(4);
const A a5(5);
const A a6(6);
const A a7(7);
const A a8(8);
const A a9(9);
const A a10(10);
const A a11(11);
const A a12(12);
const A a13(13);
const A a14(14);
const A a15(15);
const A a16(16);
const A a17(17);

This should print from 1 to 17, but for example master gives:

A(9), this = 0x7f9f2174e088
A(17), this = 0x7f9f2174e108
A(16), this = 0x7f9f2174e0f8
A(15), this = 0x7f9f2174e0e8
A(14), this = 0x7f9f2174e0d8
A(13), this = 0x7f9f2174e0c8
A(12), this = 0x7f9f2174e0b8
A(11), this = 0x7f9f2174e0a8
A(10), this = 0x7f9f2174e098
A(1), this = 0x7f9f2174e008
A(8), this = 0x7f9f2174e078
A(7), this = 0x7f9f2174e068
A(6), this = 0x7f9f2174e058
A(5), this = 0x7f9f2174e048
A(4), this = 0x7f9f2174e038
A(3), this = 0x7f9f2174e028
A(2), this = 0x7f9f2174e018
~A(2), this = 0x7f9f2174e018
~A(3), this = 0x7f9f2174e028
~A(4), this = 0x7f9f2174e038
~A(5), this = 0x7f9f2174e048
~A(6), this = 0x7f9f2174e058
~A(7), this = 0x7f9f2174e068
~A(8), this = 0x7f9f2174e078
~A(1), this = 0x7f9f2174e008
~A(10), this = 0x7f9f2174e098
~A(11), this = 0x7f9f2174e0a8
~A(12), this = 0x7f9f2174e0b8
~A(13), this = 0x7f9f2174e0c8
~A(14), this = 0x7f9f2174e0d8
~A(15), this = 0x7f9f2174e0e8
~A(16), this = 0x7f9f2174e0f8
~A(17), this = 0x7f9f2174e108
~A(9), this = 0x7f9f2174e088

(at least destruction order is consistent)

For the LHCb headers, this causes problems because some constructor calls reference other global const objects and the scrambled order means they are not constructed yet.

This seems to be caused by #13614, which was meant to fix #13429, and therefore affects v6.28/08, where it was backported, and later versions (all v6.30, v6.32, master). I'll try to understand why the order starts changing with more than 16 const variables and work on a fix next.

@hahnjo
Copy link
Member

hahnjo commented Jun 14, 2024

This seems to be caused by #13614, which was meant to fix #13429, and therefore affects v6.28/08, where it was backported, and later versions (all v6.30, v6.32, master). I'll try to understand why the order starts changing with more than 16 const variables and work on a fix next.

It turns out there should be a llvm::stable_sort instead of llvm::sort to preserve order between constructors with the same priority. With 16 const variables, we are lucky - maybe because it switches to a different sorting algorithm below a threshold? I submitted an upstream LLVM fix: llvm/llvm-project#95532 and will work on applying it to all affected ROOT versions.

@vgvassilev
Copy link
Member

Awesome! Thanks for the detailed analysis and a fix!

hahnjo added a commit to hahnjo/root that referenced this issue Jun 14, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes root-project#15511
@pikacic
Copy link
Author

pikacic commented Jun 14, 2024

Thanks a lot!

hahnjo added a commit that referenced this issue Jun 14, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes #15511
hahnjo added a commit to hahnjo/root that referenced this issue Jun 14, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes root-project#15511

(cherry picked from commit a60f353)
hahnjo added a commit to hahnjo/root that referenced this issue Jun 14, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes root-project#15511

(version of commit a60f353 for
LLVM 13 in v6-30-00-patches)
hahnjo added a commit to hahnjo/root that referenced this issue Jun 14, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes root-project#15511

(version of commit a60f353 for
LLVM 13 in v6-28-00-patches)
dpiparo pushed a commit that referenced this issue Jun 15, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes #15511

(cherry picked from commit a60f353)
hahnjo added a commit that referenced this issue Jun 17, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes #15511

(version of commit a60f353 for
LLVM 13 in v6-30-00-patches)
hahnjo added a commit that referenced this issue Jun 17, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes #15511

(version of commit a60f353 for
LLVM 13 in v6-28-00-patches)
@dpiparo
Copy link
Member

dpiparo commented Jun 19, 2024

As requested by LHCb, a release for the 6.30 branch including the fix was provided today: https://root-forum.cern.ch/t/root-6-30-08-is-out

silverweed pushed a commit to silverweed/root that referenced this issue Aug 19, 2024
Constructors with the same priority should keep their relative order
that was specified. This is important for clang-repl with many const
variables after commit 05137ecfca ("[clang-repl] Emit const variables
only once").

---

Fixes root-project#15511
hahnjo added a commit to hahnjo/root that referenced this issue Sep 13, 2024
hahnjo added a commit to hahnjo/root that referenced this issue Sep 13, 2024
hahnjo added a commit that referenced this issue Sep 18, 2024
hahnjo added a commit to hahnjo/root that referenced this issue Sep 18, 2024
(cherry picked from commit 67ff470)
hahnjo added a commit that referenced this issue Sep 18, 2024
(cherry picked from commit 67ff470)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Issues
Status: Issues
Development

Successfully merging a pull request may close this issue.

6 participants