Skip to content

Conversation

@pcanal
Copy link
Member

@pcanal pcanal commented Apr 14, 2025

This PR helped debug the issue related to #18373

This is an improvement in the detection of unwanted auto-parsing.

The roottest companion PR is root-project/roottest#1305

To ease debugging of unwanted auto-parsing triggered by TClass::GetClass, 2 new features are introduced.

  1. Give access to the list of classes that triggered an autoparsing:
// Print the list
gInterpreter->Print("autoparsed"); 
// Get the list/set:
((TCling*)gInterpreter)->GetAutoParseClasses();
  1. Add interface (to be further refined) to completely disallow auto-parsing during TClass::GetClass
Build ROOT (actually just TClass.cxx) with -DROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING

We "could" allow further way to customize it:

   //   - environment variable ROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING
   //   - rootrc key Root.TClass.GetClass.AutoParsing
   //   - TClass::SetGetClassAutoParsing

In addition, gDebug >=1 now print an additional line:

TCling::AutoParse: parsed 1 headers for reco::PFRecHitSoALayout<128,false>

@pcanal
Copy link
Member Author

pcanal commented Apr 14, 2025

@makortel @Dr15Jones Any opinions on the interface to see the auto-parsed classes and to disable auto-parsing in TClass::GetClass.

Note: this was extracted/separated from #18373

@Dr15Jones
Copy link
Collaborator

I'm concerned that keeping the full list of what was auto parsed to lead to a noticeable memory increase. In my opinion, the additional gDebug enabled printout is probably sufficient to allow CMS to do what we want.

@github-actions
Copy link

github-actions bot commented Apr 14, 2025

Test Results

    18 files      18 suites   4d 4h 40m 50s ⏱️
 2 742 tests  2 742 ✅ 0 💤 0 ❌
47 673 runs  47 673 ✅ 0 💤 0 ❌

Results for commit 214f513.

♻️ This comment has been updated with latest results.

@pcanal pcanal force-pushed the master-18363-TClass branch 2 times, most recently from a3a40ee to 143831c Compare April 14, 2025 22:37
@makortel
Copy link

I'm also concerned of the memory use of the added fAutoParseClasses and fAutoLoadedLibraries members that seem to be filled unconditionally (i.e. even if we don't want a printout). On a quick look on a random recent CMSSW release, we have O(11k) classes declared in the DataFormats .rootmap files. I'd expect not all of those classes to be auto-loaded (or auto-parsed), but nevertheless the potential cost of these two sets could be in the MB range.

I'm wondering about the difference of TInterpreter::SuspendAutoParsing guard (that seems to call TInterpreter::SetSuspendAutoParsing()) and TInterpreter::SetClassAutoparsing(). By quick look those two functions set different flags in TCling, and figuring out their behavior from the code alone seemed complicated.

From usability standpoint I suspect a global setting would not seem to necessarily help CMS. We'd need the header parsing enabled for the following cases

  • dictionary is not necessary for the type
    • types like std::pair (although we still define std::pair dictionaries); or at least we assume header parsing would be needed for types that ROOT recommends us to not declare a dictionary for
  • cut parser (or other similar things that need the member functions)

To be useful within cmsRun, we'd like a system where the auto-parsing could be disabled for specific ROOT calls (but I understand that approach would not be easy to implement).

@pcanal
Copy link
Member Author

pcanal commented Apr 16, 2025

The memory cost is:

  • one library name per library that is actually autoloaded
  • one normalized class name for (only) each class that actually trigger auto-parsing. Apriori this cost is negligible compared to the cost of the header being parsed (i.e. at the very least it contains that same name somewhere when that class is declared :) ). Indeed the worst case scenario is one class name per header file for all the classes but that is extremely unlikely since it would required all the class being used and needing auto-parsing and more importantly the order being such that the later class' header are not loaded indirectly by the earlier class. In addition, if I remember correctly, the header are actually bundle by dictionary so in reality the worse case scenario would be one class name per dictionary. (And in the case of well behave I/O job, of course, no overhead :) ).

@pcanal
Copy link
Member Author

pcanal commented Apr 16, 2025

From usability standpoint I suspect a global setting would not seem to necessarily help CMS.

The idea of the setting was solely for debugging purposes .... Although ....

Reading your list, it actually sounds like disabling the auto-parsing solely during TClass::GetClass may still do the trick. For the cut parser, it might/should do the auto-parsing later when information about functions are needed.

@pcanal
Copy link
Member Author

pcanal commented Apr 16, 2025

difference between TInterpreter::SetSuspendAutoParsing() and TInterpreter::SetClassAutoparsing()

It is not clear as SetClassAutoparsing started its life with a similar purpose as the one served by SetSuspendAutoParsing today but seem to nowadays serve a purpose related to the support for C++ modules.

@pcanal
Copy link
Member Author

pcanal commented Apr 16, 2025

@makortel Should we test CMSSW with ROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING turned on before deciding which version of this to merge?

@makortel
Copy link

difference between TInterpreter::SetSuspendAutoParsing() and TInterpreter::SetClassAutoparsing()

It is not clear as SetClassAutoparsing started its life with a similar purpose as the one served by SetSuspendAutoParsing today but seem to nowadays serve a purpose related to the support for C++ modules.

Just to clarify, do you mean the SetClassAutoparsing is nowadays related to the support for C++ modules, and SetSuspendAutoParsing is for preventing header parsing in, umm, dictionary-using code?

@makortel
Copy link

@makortel Should we test CMSSW with ROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING turned on before deciding which version of this to merge?

With "which version" do you mean whether ROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING is disabled or enabled by default at build time?

In any case I'd be fine with testing ROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING turned on. @smuzaffar What do you think?

@pcanal
Copy link
Member Author

pcanal commented Apr 16, 2025

With "which version"

Choices that we have:

  1. Drop the code related to suspending auto-parsing
  2. Keep the code related to suspending auto-parsing only with a #define
  3. Keep the code related to suspending auto-parsing only with rootrc flag
  4. Keep both
    In addition we have:
    a. Always record auto-parsed library and auto-parsed enducing class name
    b. Record one or both only on demand.

Having the build with the suspending auto-parsing force on was to answer the question on whether it lead to any of:

  1. No failures at all in CMSSW
  2. Failures that shows actually missing dictionary
  3. Failures due to classes that are not meant to have dictionary
  4. Failures in the cut parser

SetSuspendAutoParsing is for preventing header parsing

It suspends/prevent any auto-parsing.

SetClassAutoparsing is nowadays related to the support for C++ modules

It is used in connection to code related to C++ modules ... I did not yet dig deeper than that :)

@makortel
Copy link

Thanks @pcanal for the clarifications. I think a test with CMSSW with this PR and ROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING enabled at build time could be useful, but I'm also a bit concerned the first test would reveal many failures (any mixture of 2-4 in your list), and some of the failures hiding other failures and we'd end up iterating with the test after fixing things (although getting things fixed should be good thing).

@smuzaffar What do you think?

@smuzaffar
Copy link
Contributor

@pcanal , @makortel we can run cmssw PR tests with ROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING .

pcanal added 4 commits April 17, 2025 13:52
If TClass.cxx is build with the cpp macro:

   ROOT_DISABLE_TCLASS_GET_CLASS_AUTOPARSING

defined, it will no longer do any auto-parsing during the
execution of `TClass::GetClass`.  This will result in not
being able to find TClass-es when the name requires not-already
loaded interpreted information (eg. a typedef to be resolved).

Comments include additional possible interfaces to turn on this
feature.
Use `gInterpreter->Print("autoparsed");` to print a list
of the class names that directly lead to auto-parsing.

Use `gCling->GetAutoParseClasses()` to programatically get a set
of the class names that directly lead to auto-parsing.
This allows to disable auto-parsing during `TClass::GetClass` for debugging purposes.
@pcanal
Copy link
Member Author

pcanal commented Jul 30, 2025

The TClass for pair<edm::Ref, bitset<64> >

The pair should not need a dictionary (but having it doesn't hurt either) unless one of its component does not have a dictionary. i.e. does edm::Ref<X> and bitset<64> have a dictionary?

@makortel
Copy link

The TClass for pair<edm::Ref, bitset<64> >

The pair should not need a dictionary (but having it doesn't hurt either) unless one of its component does not have a dictionary. i.e. does edm::Ref<X> and bitset<64> have a dictionary?

Yes, the edm::Ref<X> is defined in https://github.com/cms-sw/cmssw/blob/0aedd569db59a96914ed14b2cb36ed054c717aa3/DataFormats/L1TrackTrigger/src/classes_def.xml#L6 and bitset<64> in https://github.com/cms-sw/cmssw/blob/0aedd569db59a96914ed14b2cb36ed054c717aa3/DataFormats/StdDictionaries/src/classes_def_others.xml#L13

@pcanal
Copy link
Member Author

pcanal commented Jul 30, 2025

Alright then the failing step must be the step removing/following typedefs. Without the header file they are no longer available. You may need to also ask for all the dictionary for some/all the related typedefs.

@makortel
Copy link

##Failure Location unknown## : Error
Test name: testProductRegistry::testAddAlias
uncaught exception of type std::exception (or derived).
- An exception of category 'LogicError' occurred.
Exception Message:
ProductResolverIndexHelper::insert - Attempt to insert duplicate entry.

This failure seems to be caused by TClass::GetClass(char const*) returning a nullptr, that then within CMS' edm::TypeWithDict leads to the use of edm::TypeWithDict::dummyType** (as a marker for invalid type), and that type ends up being used for all the entries in that test, leading to the "duplicate entry" exception.

I checked "manually" that TClass::GetClass("edm::OwnVector<edmtest::SimpleDerived>") returns nullptr, whereas TClass::GetClass(typeid(edm::OwnVector<edmtest::SimpleDerived>)) returns a non-null pointer. The dictionary for that class is defined in https://github.com/cms-sw/cmssw/blob/0aedd569db59a96914ed14b2cb36ed054c717aa3/DataFormats/TestObjects/src/classes_def.xml#L63.

@makortel
Copy link

Alright then the failing step must be the step removing/following typedefs. Without the header file they are no longer available. You may need to also ask for all the dictionary for some/all the related typedefs.

Thanks. I defined the relevant dictionaries (*) with the expanded types, and the job went tiny bit forward, i.e. it now makes the same complaint but for std::pair<edm::Ref<Y>, std::bitset<64> >. Just to confirm I understood correctly, if we want this we should always use the fully expanded types in the classes_def.xml?

(*)

std::pair<edm::Ref<X>, std::bitset<64> >
std::vector<std::pair<edm::Ref<X>, std::bitset<64> > >
std::vector<std::vector<std::pair<edm::Ref<X>, std::bitset<64> > > >
edm::Wrapper<std::vector<std::vector<std::pair<edm::Ref<X>, std::bitset<64> > > > >

@pcanal
Copy link
Member Author

pcanal commented Jul 30, 2025

Just to confirm I understood correctly, if we want this we should always use the fully expanded types in the classes_def.xml?

When you pass the expanded type, ROOT will record the long name and the normalized name (where typedef are fully resolved). If there is any in-between name used (with some typedef resolved but not others), we would need to register those spelling too if we want to avoid the need for autoparsing.

We also support the following syntax:

<typedef name="typedefB" />

@makortel
Copy link

makortel commented Jul 31, 2025

The TClass for pair<edm::Ref<X>, bitset<64> > used as the value type of the compiled collection proxy vector<pair<edm::Ref<X>, bitset<64> > is not loaded

Adding entries for the pair<...> and vector<pair<...>> (either via type aliases or the fully expanded types) did not change the behavior.

Never mind, testing again I see that adding the pair<...> and vector<pair<...>> (either with "full expansion" or via type aliases") works (i.e. the job moves to the next failure). Sorry for the false alarm.

@makortel
Copy link

##Failure Location unknown## : Error
Test name: testProductRegistry::testAddAlias
uncaught exception of type std::exception (or derived).
- An exception of category 'LogicError' occurred.
Exception Message:
ProductResolverIndexHelper::insert - Attempt to insert duplicate entry.

This failure seems to be caused by TClass::GetClass(char const*) returning a nullptr, that then within CMS' edm::TypeWithDict leads to the use of edm::TypeWithDict::dummyType** (as a marker for invalid type), and that type ends up being used for all the entries in that test, leading to the "duplicate entry" exception.

I checked "manually" that TClass::GetClass("edm::OwnVector<edmtest::SimpleDerived>") returns nullptr, whereas TClass::GetClass(typeid(edm::OwnVector<edmtest::SimpleDerived>)) returns a non-null pointer. The dictionary for that class is defined in https://github.com/cms-sw/cmssw/blob/0aedd569db59a96914ed14b2cb36ed054c717aa3/DataFormats/TestObjects/src/classes_def.xml#L63.

This failure turned out to be a mismatch between the type name being requested and the type that was registered in the classes_def.xml (second, defaulted template argument being omitted in the TClass::GetClass() call). Using the same type names in the test as in the classes_def.xml makes the test pass.

@pcanal
Copy link
Member Author

pcanal commented Aug 1, 2025

This failure turned out to be a mismatch between the type name being requested and the type that was registered in the classes_def.xml (second, defaulted template argument being omitted in the TClass::GetClass() call). Using the same type names in the test as in the classes_def.xml makes the test pass.

A priori if using the short name in the classes_def.xml should also work by allowing both the short and long (with typedef resolved) name to be usable with TClass::GetClass. I.e. I understood the fix done to be 'use long name in TClass::GetClasswhile I proposeuse the short name in the selection.xml`)

@makortel
Copy link

makortel commented Aug 1, 2025

This failure turned out to be a mismatch between the type name being requested and the type that was registered in the classes_def.xml (second, defaulted template argument being omitted in the TClass::GetClass() call). Using the same type names in the test as in the classes_def.xml makes the test pass.

A priori if using the short name in the classes_def.xml should also work by allowing both the short and long (with typedef resolved) name to be usable with TClass::GetClass. I.e. I understood the fix done to be 'use long name in TClass::GetClass' while I propose' use the short name in the selection.xml`)

The "short name" could be useful if the string would be passed by a human, but in the CMSSW framework case the string name originates from std::type_info, so (with present code) it's easier to get to the "long name". (caveat: I'm still in a process of finding out if we create these strings from anything else than starting from std::type_info).

@makortel
Copy link

makortel commented Aug 1, 2025

I took a look of another unit test failure reported in cms-sw#222 (comment)

---> test TestDQMOfflineConfiguration_80 had ERRORS

The test fails because cmsRun segfaults in

#2  0x00007f29e4ea4164 in sig_dostack_then_abort () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f29eac3950d in TClass::GetMissingDictionariesForPairElements(TCollection&, TCollection&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so
#5  0x00007f29eac38d74 in TClass::GetMissingDictionariesWithRecursionCheck(TCollection&, TCollection&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so
#6  0x00007f29eac393b8 in TClass::GetMissingDictionariesForMembers(TCollection&, TCollection&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so
#7  0x00007f29eac38fc8 in TClass::GetMissingDictionariesWithRecursionCheck(TCollection&, TCollection&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so
#8  0x00007f29eac393b8 in TClass::GetMissingDictionariesForMembers(TCollection&, TCollection&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so
#9  0x00007f29eac39e42 in TClass::GetMissingDictionaries(THashTable&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so
#10 0x00007f29e9ee8361 in edm::checkClassDictionaries(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, edm::TypeWithDict const&) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreReflection.so
#11 0x00007f29eb01ec05 in edm::ProductRegistryHelper::addToRegistry(__gnu_cxx::__normal_iterator<edm::ProductRegistryHelper::TypeLabelItem const*, std::vector<edm::ProductRegistryHelper::TypeLabelItem, std::allocator<edm::ProductRegistryHelper::TypeLabelItem> > > const&, __gnu_cxx::__normal_iterator<edm::ProductRegistryHelper::TypeLabelItem const*, std::vector<edm::ProductRegistryHelper::TypeLabelItem, std::allocator<edm::ProductRegistryHelper::TypeLabelItem> > > const&, edm::ModuleDescription const&, edm::SignallingProductRegistryFiller&, edm::ProductRegistryHelper*, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#12 0x00007f29eb01f1ef in edm::ProducerBase::registerProducts(edm::ProducerBase*, edm::SignallingProductRegistryFiller*, edm::ModuleDescription const&) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#13 0x00007f29eaff4c6d in edm::ModuleMakerBase::makeModule(edm::MakeModuleParams const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#14 0x00007f29eaff4dc8 in edm::ModuleHolderFactory::makeModule(edm::MakeModuleParams const&, edm::ModuleTypeResolverMaker const*, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) const () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#15 0x00007f29eaff8337 in edm::ModuleRegistry::getModule(edm::MakeModuleParams const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&, edm::signalslot::Signal<void (edm::ModuleDescription const&)>&) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#16 0x00007f29eb050e7f in edm::(anonymous namespace)::getModule(edm::ParameterSet&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, edm::ModuleRegistry&, edm::SignallingProductRegistryFiller&, edm::ActivityRegistry&, edm::PreallocationConfiguration const*, std::shared_ptr<edm::ProcessConfiguration const>) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#17 0x00007f29eb058ba4 in edm::ScheduleBuilder::ScheduleBuilder(edm::ModuleRegistry&, edm::ParameterSet&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, edm::PreallocationConfiguration const&, edm::SignallingProductRegistryFiller&, edm::ActivityRegistry&, std::shared_ptr<edm::ProcessConfiguration const>) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#18 0x00007f29eb042ca0 in edm::Schedule::Schedule(edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::SignallingProductRegistryFiller&, edm::ExceptionToActionTable const&, std::shared_ptr<edm::ActivityRegistry>, std::shared_ptr<edm::ProcessConfiguration const>, edm::PreallocationConfiguration const&, edm::ProcessContext const*, edm::ModuleTypeResolverMaker const*) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#19 0x00007f29eb05e781 in edm::ScheduleItems::initModules(edm::ParameterSet&, edm::service::TriggerNamesService const&, edm::PreallocationConfiguration const&, edm::ProcessContext const*, edm::ModuleTypeResolverMaker const*) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#20 0x00007f29eafbfb37 in tbb::detail::d2::function_task<edm::EventProcessor::init(std::shared_ptr<edm::ProcessDesc>&, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy)::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) [clone .lto_priv.0] () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#21 0x00007f29eb1cf87b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::external_waiter> (waiter=..., t=0x7f29e81dae00, this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/task_dispatcher.h:334
#22 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::external_waiter> (waiter=..., t=<optimized out>, this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/task_dispatcher.h:470
#23 tbb::detail::r1::task_dispatcher::execute_and_wait (t=<optimized out>, wait_ctx=..., w_ctx=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/task_dispatcher.cpp:168
#24 0x00007f29eaf8ec90 in edm::EventProcessor::init(std::shared_ptr<edm::ProcessDesc>&, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#25 0x00007f29eaf923f1 in edm::EventProcessor::EventProcessor(std::shared_ptr<edm::ProcessDesc>, edm::ServiceToken const&, edm::serviceregistry::ServiceLegacy) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/lib/el8_amd64_gcc12/libFWCoreFramework.so
#26 0x0000000000408369 in tbb::detail::d1::task_arena_function<main::{lambda()#1}::operator()() const::{lambda()#1}, void>::operator()() const ()
#27 0x00007f29eb1bdf71 in tbb::detail::r1::task_arena_impl::execute (ta=..., d=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2022.0.0-79b5a917b0c13f831cd534a5b9f53a95/tbb-v2022.0.0/src/tbb/arena.cpp:821
#28 0x000000000040a283 in main::{lambda()#1}::operator()() const ()
#29 0x00000000004051b8 in main ()

@pcanal
Copy link
Member Author

pcanal commented Aug 1, 2025

The "short name" could be useful if ... in the CMSSW framework case the string name originates from std::type_info

The point is by using the short name in the exception you get both at the same time and the selection.xml is more readable :)

@pcanal
Copy link
Member Author

pcanal commented Aug 1, 2025

The test fails because cmsRun segfaults in
#4 0x00007f29eac3950d in TClass::GetMissingDictionariesForPairElements(TCollection&, TCollection&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so

That routine should not crash :( ... even when they are missing dictionary!

@makortel
Copy link

makortel commented Aug 1, 2025

The "short name" could be useful if ... in the CMSSW framework case the string name originates from std::type_info

The point is by using the short name in the exception you get both at the same time and the selection.xml is more readable :)

Is there a way to get the short name from the long name (or from TClass object)?

@pcanal
Copy link
Member Author

pcanal commented Aug 1, 2025

Is there a way to get the short name from the long name (or from TClass object)?

You can get from TClassTable the list of registered alias for the name .. but of course only the one that have been registered. If you don't have a user readable version, the be it (but in first approx you could drop the template argument that have their default value).

@makortel
Copy link

makortel commented Aug 4, 2025

Is there a way to get the short name from the long name (or from TClass object)?

You can get from TClassTable the list of registered alias for the name .. but of course only the one that have been registered. If you don't have a user readable version, the be it (but in first approx you could drop the template argument that have their default value).

Ok. Although in this case the error message originates from ROOT.

@makortel
Copy link

makortel commented Aug 4, 2025

The test fails because cmsRun segfaults in
#4 0x00007f29eac3950d in TClass::GetMissingDictionariesForPairElements(TCollection&, TCollection&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so

That routine should not crash :( ... even when they are missing dictionary!

In this case the TClass in question is for pair<edm::RefProd<vector<reco::CaloCluster> >,edm::RefProd<vector<ticl::Trackster> > >. I believe we do not have a dictionary defined for that type. The crash occurs in

root/core/meta/src/TClass.cxx

Lines 4001 to 4013 in 3cf3b64

void TClass::GetMissingDictionariesForPairElements(TCollection& result, TCollection& visited, bool recurse)
{
// Pair is a special case and we have to check its elements for missing dictionaries
// Pair is a transparent container so we should always look at its.
TVirtualStreamerInfo *SI = (TVirtualStreamerInfo*)this->GetStreamerInfo();
for (int i = 0; i < 2; i++) {
TClass* pairElement = ((TStreamerElement*)SI->GetElements()->At(i))->GetClass();
if (pairElement) {
pairElement->GetMissingDictionariesWithRecursionCheck(result, visited, recurse);
}
}
}

on line 4008. From gdb at the point of crash

(gdb) p i
$1 = 0
(gdb) p SI
$9 = (TVirtualStreamerInfo *) 0x7fffb1de6700
(gdb) p SI->GetElements()
$11 = (TObjArray *) 0x7fffb1e52580
(gdb) p SI->GetElements()->GetEntries()
$12 = 0
(gdb) p ((TStreamerElement*)SI->GetElements()->At(0))
$13 = (TStreamerElement *) 0x0
(gdb) p ((TStreamerElement*)SI->GetElements()->At(1))
$14 = (TStreamerElement *) 0x0

So for some reason the SI->GetElements() has no elements, but the loop assumes there to be (at least) 2 elements, and hence the crash (lucky to find 0x0 there).

@makortel
Copy link

makortel commented Aug 7, 2025

The test fails because cmsRun segfaults in
#4 0x00007f29eac3950d in TClass::GetMissingDictionariesForPairElements(TCollection&, TCollection&, bool) () from /build/mkortela/debug/rootpr18402/CMSSW_15_1_ROOT6_X_2025-07-28-2300/external/el8_amd64_gcc12/lib/libCore.so

That routine should not crash :( ... even when they are missing dictionary!

In this case the TClass in question is for pair<edm::RefProd<vector<reco::CaloCluster> >,edm::RefProd<vector<ticl::Trackster> > >. I believe we do not have a dictionary defined for that type.

Adding the dictionary for std::pair<edm::RefProd<std::vector<reco::CaloCluster> >,edm::RefProd<std::vector<ticl::Trackster> > > makes the crash to go away.

@makortel
Copy link

With @pcanal we traced another failure from cms-sw#222 (comment)

---> test testHeterogeneousCoreAlpakaTestWriteReadSerialSync had ERRORS

to be caused by #19705. As a workaround we could modify the classes_def.xml to include (also) the type alias request before the "normalized name" request. I'm on the fence whether to do that though (or wait #19705 to be resolved first).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants