Skip to content

Use std::vector instead of std::map for reading or writing an arbitrary number of collections. #201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 15, 2024

Conversation

jmcarcell
Copy link
Member

@jmcarcell jmcarcell commented Jun 18, 2024

The current implementation with std::map is not easy to use. The collections get ordered alphabetically in std::map so the order set in the steering file is lost, which means if there is some kind of mapping (like one output collection for each input collection) then one has to reproduce this map in the algorithm and the way I found to do it is to have another property with the input and output names again. For output one creates the collections and then puts them in a vector, I think it can't get simpler than that. For input, since there can't be vectors to references one gets a pointer to the collection, which is not ideal but not so bad. With a custom type it would be possible to have a vector that when using [] can get a reference but then that introduces a custom type which I don't like.

BEGINRELEASENOTES

  • FunctionalAlgorithms: Use std::vector instead of std::map for input or output an arbitrary number of collections.
  • Add an inputLocations and outputLocations methods to be able to retrieve the locations set in the steering file.

ENDRELEASENOTES

@jmcarcell jmcarcell force-pushed the vector branch 2 times, most recently from 31b82ff to fec79be Compare June 18, 2024 14:05
Copy link
Member

@m-fila m-fila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick look. There are still a few occurrences of 'map' where 'vector' should be used

Functions detail::readMapInputs and detail::putMapOutputs could also be renamed since they operate on vectors

I think const * are acceptable

In case the information about collection keys will be needed:
In Gaudi there isinputLocation() and outputLocation() (defined in DataHandleMixing) that can be used to get keys of inputs and outputs (like tuple either by index or type if unique). From what I seen they are not accessible from k4FWCore functional algorithms. It might be good idea to add something similar here.
Alternatively something like std::get<n>(m_outputs).at(m).fullKey().key() could be used to get key of n-th output and m-th collection (given n-th output is a vector of collections)

@jmcarcell jmcarcell force-pushed the vector branch 2 times, most recently from 0b75488 to 2d17f31 Compare June 24, 2024 09:39
@jmcarcell
Copy link
Member Author

I had a quick look. There are still a few occurrences of 'map' where 'vector' should be used

Functions detail::readMapInputs and detail::putMapOutputs could also be renamed since they operate on vectors

I think const * are acceptable

In case the information about collection keys will be needed: In Gaudi there isinputLocation() and outputLocation() (defined in DataHandleMixing) that can be used to get keys of inputs and outputs (like tuple either by index or type if unique). From what I seen they are not accessible from k4FWCore functional algorithms. It might be good idea to add something similar here. Alternatively something like std::get<n>(m_outputs).at(m).fullKey().key() could be used to get key of n-th output and m-th collection (given n-th output is a vector of collections)

There is also m_inputLocations and m_outputLocations that can be used like here: https://github.com/key4hep/k4Reco/pull/2/files#diff-fd32a841abe95600867a31e1e8a8f06232c5fe7e654319fdfad0d14b55760bc0R256. I don't have a strong opinion on implementing inputLocation() and outputLocation() or not.

@jmcarcell
Copy link
Member Author

I added a inputLocations and outputLocations to retrieve those and they are used in a couple of tests. Now I notice the Gaudi version is in singular but here it can be more than one. I think them being views is fine, but they can easily be changed to a vector.

@jmcarcell
Copy link
Member Author

Any more comments or comments about the functions to get the locations? @m-fila

@m-fila
Copy link
Member

m-fila commented Jul 5, 2024

Any more comments or comments about the functions to get the locations? @m-fila

Sorry I was busy with other things, I'll take another look

}
for (auto& [key, val] : input) {
for (int i = 0; i < 3; i++) {
if (inputLocations(0)[i] != "MCParticles" + std::to_string(i)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no easy way to have inputLocations take the key from the KeyValues above, right? I.e. such that something like this would becom possible

if (inputLocations("InputCollection")[i] == "MCParticles" + std::to_string(i)) { /**/ }

It doesn't really matter here, but if there are multiple input variable length inputs, this would make it much harder to break things, I think. However, I am not sure how often that actually is the case in real world uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is:

  • either by property (returning std::vector<std::string>>)
    auto inputLocations(const std::string& name) const {
      Gaudi::Property<std::vector<std::string>> tmp;
      tmp.assign(this->getProperty(name));
      return tmp.value();
    }
  • or from m_inputlocations (returing view)
auto intpuLocations(const std::string& name) const {
  auto it = std::ranges::find_if(m_inputLocations, [&name](const auto& prop) { return prop.name() == name; });
  if (it == m_inputLocations.end()) {
    throw GaudiException("Input named " + name + " not found", "Consumer", StatusCode::FAILURE);
  }
  return *it | std::views::transform([](const DataObjID& id) -> const auto& { return id.key(); });
}

It could be also nice to get by type (if it's unique, there is a tuple underneath) but that seems to be more complicated as detail::transformType transforms all the collection types to podio::CollectionBase

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added functions to get by name. Getting by type is not trivial since one still has to get the index to find where the vector with the names is... so maybe if someone asks for it? By index and name should be plenty of options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the use case for getting this by type is limited(?). i would assume that in that case you don't really care about the collection name since it's only one in any case. The only reason I could think of would be for log messages, but in that case, one of the overloads here with index or name, should also work, right?

Copy link
Member

@m-fila m-fila Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, it's complicated to implement and there are already ways to achieve the same result so I wouldn't bother with it now. I mentioned it mostly for completeness since they have it in Gaudi.

The only reason I could think of would be for log messages, but in that case, one of the overloads here with index or name, should also work, right?

I could think of something like this: when there multiple arguments like here or more

using FloatColl         = std::vector<const podio::UserDataCollection<float>*>;
using ParticleColl      = std::vector<const edm4hep::MCParticleCollection*>;
using SimTrackerHitColl = std::vector<const edm4hep::SimTrackerHitCollection*>;
using TrackerHitColl    = std::vector<const edm4hep::TrackerHit3DCollection*>;
using TrackColl         = std::vector<const edm4hep::TrackCollection*>;

using retType = std::tuple<std::vector<podio::UserDataCollection<float>>, std::vector<edm4hep::MCParticleCollection>,
                           std::vector<edm4hep::MCParticleCollection>, std::vector<edm4hep::SimTrackerHitCollection>,
                           std::vector<edm4hep::TrackerHit3DCollection>, std::vector<edm4hep::TrackCollection>>;

retType operator()(const FloatColl& floatVec, const ParticleColl& particlesVec,
                     const SimTrackerHitColl& simTrackerHitVec, const TrackerHitColl& trackerHitVec,
                     const TrackColl& trackVec) const

I think it'd be nice to avoid counting the arguments to write outputLocations(4) outputLocations(5) or scrolling to the constructor to find the name in KeyValues, and instead just write outputLocations<std::vector<edm4hep::TrackCollection>>() when I want to get the name for the locations of that "vector of track-collection" output

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion about this, it can be useful. But so far I think we'll have few algorithms with any number of inputs or outputs so the reach will be small.

jmcarcell added 10 commits July 15, 2024 10:34
arbitrary number of collections. The current implementation with std::map is not
easy to use. The collections get ordered alphabetically in std::map so the order
set in the steering file is lost, which means if there is some kind of
mapping (like one output collection for each input collection) then one has to
reproduce this map in the algorithm to make sure it's correct.
@jmcarcell
Copy link
Member Author

I'll merge this later today, #209 has to be rebased on top of this. Note to myself: fix formatting, clang-format with Clang 18 gives different results from what you get from the nightlies...

@jmcarcell jmcarcell merged commit a3380b2 into main Jul 15, 2024
4 of 9 checks passed
@jmcarcell jmcarcell deleted the vector branch July 15, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants