Use detray geometry on GPUs #23

asalzburger · 2021-01-28T07:57:47Z

asalzburger
Jan 28, 2021
Maintainer

The detector class implements a geometry without any polymorphism, the different objects are grouped into tuples and accessed via a typed index, i.e. the one index for the type of object, one index for the position within the tuple column.

However, there is still the usage of std::containers for structuring the geometry, all can be found in the containers.hpp definition file.

What would we need to change to be able to use this geometry on the GPU?

after construction of the geometry, the lengths of the containers are indeed known
do we need to then (after closing the geometry), transform it into cuda_geometry with different container classes?

There is a test that creates a simple pixel geometry, and we will make reading in the ITk geometry and or TrackML/OpenDataDetector geometry available.

@krasznaa @niermann999 @XiaocongAi - what are your thoughts on that?

krasznaa · 2021-01-28T08:13:00Z

krasznaa
Jan 28, 2021
Collaborator

Thanks for the message Andi!

Let me have a look at the code today, and give you feedback once I have a "fuller" picture. 😄

At the same time let me include @stephenswat in this. Could you give him access to the code as well? Since I very much want to involve him in these developments. 😉

1 reply

asalzburger Jan 28, 2021
Maintainer Author

Done, gave him access - I haven't written any documentation of the code yet, but we are still missing the grid sorting of the surfaces, @niermann999 will work on that, issue #21 summarizes this.

krasznaa · 2021-01-28T15:48:00Z

krasznaa
Jan 28, 2021
Collaborator

So... I thought the fastest way to describe how I believe the code will need to be structured, if I throw an example together. You can find it here:

https://github.com/krasznaa/detray_data_model

The main features are the following:

We should be able to use std::vector in the "host code" for setting up the contiguous memory arrays that describe the geometry. A custom allocator can be used for this. (https://github.com/krasznaa/detray_data_model/blob/main/allocators/managed_allocator.hpp)
In the device code we can not use std::vector unfortunately. For many different reasons actually. So if we use templated code that assumes a "vector type interface" on top of an array, we need to provide a class mimicking that behaviour ourselves. (https://github.com/krasznaa/detray_data_model/blob/main/vector/vector.hpp)

Note that the code in that repository is not representative of how the code providing us with the "ultimate performance" should look like. I took some shortcuts with the "managed memory" feature of CUDA, to not have to write too much code at first. But based on previous experience, this design should work well for us for the setup with manual memory management as well. (When we control ourselves exactly when memory gets copied where. Which on the whole is more performant than letting CUDA figure this out using page faults at runtime.)

Cheers,
Attila

P.S. Of course the allocator should also not just use CUDA memory allocations directly, but go through an intermediate layer, like what we're looking at with @stephenswat. (https://nvlabs.github.io/cub/)

16 replies

paulgessinger Jan 30, 2021
Maintainer

I'm pretty sure sizeof( std::array< int, 20 >() ) == sizeof(int)*20 (only for trivial alignment of course) in all implementations I'm aware of. It's an aggregator type, which the standard guarantees to not do any dynamic allocation. I think the fact that there's no padding is up to the implementation, but all implementations I'm aware of have no padding. I've played around with std::array in various ways in some unrelated projects a fair bit.

That being said, it not being usable directly with CUDA was what I suspected. What are your thoughts on nvidia's thrust library in this regard?

asalzburger Jan 30, 2021
Maintainer Author

I only saw the thrust library but I can't say anything about it, but if it happens to provide stl-like behavior that would make the code quite readable between CPU and GPU ...

krasznaa Jan 30, 2021
Collaborator

🤔 I looked at the thrust code while writing the "seedfinder2" CUDA code. Unfortunately I wasn't terribly impressed by it back then.

I had a look again now, just to see if it's any smarter in the latest CUDA version than what I remembered. But it doesn't seem so. 😦 Note that something like thrust::device_vector is not a vector type that you can use in device code. For a moment, misremembering what I saw before, I thought it may indeed be a good candidate for the "outer vector type" in the detector description. (When we instantiate the object in device code.) But it's not. If you look at the Doxygen documentation, you'll see that precious few of its functions are marked with __device__. No, thrust::device_vector is a type that you use in your host code to interact with data in the GPU's memory.

The main purpose of thrust is to provide you with GPU aided algorithms. Along the lines of Parallel STL. It's not a set of classes designed for convenient device code writing. 😦

As for the sizeof( std::array< int, 20 > ) thing... You're right, it does seem to function like that in the current libstdc++ at least. However the standard doesn't guarantee any of this. From what I understood, it is allowed for the std::array implementation to manage its underlying memory however it wishes. So we should not rely on that behaviour in our code.

paulgessinger Jan 31, 2021
Maintainer

Being an aggregate type, the standard dictates it won't allocate. But you're right, other than that it's up to the implementation. It shouldn't be terribly difficult to write our own std::array implementation that sticks to this, if we so desire.

krasznaa Feb 1, 2021
Collaborator

Just before the repository would be moved...

I now have a dummy example for using a "detector object" inside of device code.

https://github.com/krasznaa/detray_data_model/blob/main/tests/test_detector_on_device.cu

There are many things in the code that should be done better, but one thing I only realised now is coming from these warnings on the code's current state:

/data/software/cuda/11.2.0/x86_64-ubuntu1804/bin/../targets/x86_64-linux/include/vector_types.h(421): warning: calling a __host__ function("std::vector< ::detray::dummy_surface, ::std::allocator< ::detray::dummy_surface> > ::vector") from a __host__ __device__ function("std::vector< ::detray::dummy_surface, ::std::allocator< ::detray::dummy_surface> > ::vector [subobject]") is not allowed

/data/software/cuda/11.2.0/x86_64-ubuntu1804/bin/../targets/x86_64-linux/include/vector_types.h(421): warning: calling a __host__ function("std::vector< ::detray::dummy_volume<    ::device_volume_vector> , ::std::allocator< ::detray::dummy_volume<    ::device_volume_vector> > > ::vector") from a __host__ __device__ function("std::vector< ::detray::dummy_volume<    ::device_volume_vector> , ::std::allocator< ::detray::dummy_volume<    ::device_volume_vector> > > ::vector [subobject]") is not allowed

The issue is that I made the default constructor of the detector type work in both host and device code. As would seem logical.

https://github.com/krasznaa/detray_data_model/blob/main/detector/dummy_detector.hpp#L90-L92

However, nvcc sees me instantiating the detector object using std::vector as a vector type in the host code. And it freaks out, thinking that I may want to use this instantiation in device code as well. So even though I don't do that, since the declaration in my code is that the default constructor should work in host and device code with any vector types, CUDA rightfully complains.

So long story short, the design may need to be adjusted a bit to not run into this...

krasznaa · 2021-02-01T15:32:02Z

krasznaa
Feb 1, 2021
Collaborator

@asalzburger, I really came to like my setup for specifying the vector type that a template should use. I really think that we should abandon the current setup where the vector types are specified through a global typedef.

I don't think anybody would disagree with this, I was just wondering who should undertake this. If you guys want to give this a go, I'm very happy to let you. Otherwise I'll take a crack at it myself.

I think by now I have a good enough idea of how I'd want to update the detray::detector type to work in GPU code. So instead of doing more work in krasznaa/detray_data_model, I'm ready to start adding code here. 😉 I just really dislike how the code currently relies on global typedefs in many places. I would much prefer if we could make those into function/class-level template types in all places.

4 replies

asalzburger Feb 1, 2021
Maintainer Author

I agree, the reason why I tpyedef'd them is because I didn't know better at the moment.

We want to shape this library to be applicable on the GPU but also as a fast navigation backend on the CPU
(I would like to try to write a NavigatorWrapper using detray for the Acts core.

So, I think we might have to try to integrate your data_model once you think it's ready and I will try to update the detector then.

krasznaa Feb 1, 2021
Collaborator

😕 The detray::detector class would need some larger updates to make it compatible with what I sketched out in https://github.com/krasznaa/detray_data_model/blob/main/detector/dummy_detector.hpp. Unfortunately the code from my repository, by itself, will not do much.

Let me try to see what I can do... Unfortunately separate parts of this repository are already a bit more inter-connected than I'd like, but maybe I can make changes in detray::detector without having to change too many other things as well.

asalzburger Feb 1, 2021
Maintainer Author

One more thing: we will need some collection/data structures also for the second R&D project the chain demonstrator (I have started a repository and unblind it shortly). What about you keep this sort of data structure repo such that we can use it as an extern/whatever in both repositories?

krasznaa Feb 1, 2021
Collaborator

🤔 Okay, let me try to see how to organise the code for that. It could indeed help things if we can share the custom allocators, hand-written containers and such between multiple repositories.

I just didn't design the layout of that repo to be compatible with the general Acts layout. But that can absolutely be changed...

krasznaa · 2021-02-01T15:55:30Z

krasznaa
Feb 1, 2021
Collaborator

As a separate question: Other than for debugging, do we need the std::string-s for anything in detray::detector?

As you may guess, std::string is not something that we could use in device code. In general we should stay away from any forms of character strings in device code...

If those are particularly useful in the host code, I'll propose a way for only using/instantiating them on the host. But if they are not really needed, we may as well just remove them.

1 reply

asalzburger Feb 1, 2021
Maintainer Author

There's no whatsoever use case in detray on std::string other than debugging purposes - and this will not change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use detray geometry on GPUs #23

{{title}}

Replies: 4 comments 22 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Use detray geometry on GPUs #23

asalzburger Jan 28, 2021 Maintainer

Replies: 4 comments · 22 replies

krasznaa Jan 28, 2021 Collaborator

asalzburger Jan 28, 2021 Maintainer Author

krasznaa Jan 28, 2021 Collaborator

paulgessinger Jan 30, 2021 Maintainer

asalzburger Jan 30, 2021 Maintainer Author

krasznaa Jan 30, 2021 Collaborator

paulgessinger Jan 31, 2021 Maintainer

krasznaa Feb 1, 2021 Collaborator

krasznaa Feb 1, 2021 Collaborator

asalzburger Feb 1, 2021 Maintainer Author

krasznaa Feb 1, 2021 Collaborator

asalzburger Feb 1, 2021 Maintainer Author

krasznaa Feb 1, 2021 Collaborator

krasznaa Feb 1, 2021 Collaborator

asalzburger Feb 1, 2021 Maintainer Author

asalzburger
Jan 28, 2021
Maintainer

Replies: 4 comments 22 replies

krasznaa
Jan 28, 2021
Collaborator

asalzburger Jan 28, 2021
Maintainer Author

krasznaa
Jan 28, 2021
Collaborator

paulgessinger Jan 30, 2021
Maintainer

asalzburger Jan 30, 2021
Maintainer Author

krasznaa Jan 30, 2021
Collaborator

paulgessinger Jan 31, 2021
Maintainer

krasznaa Feb 1, 2021
Collaborator

krasznaa
Feb 1, 2021
Collaborator

asalzburger Feb 1, 2021
Maintainer Author

krasznaa Feb 1, 2021
Collaborator

asalzburger Feb 1, 2021
Maintainer Author

krasznaa Feb 1, 2021
Collaborator

krasznaa
Feb 1, 2021
Collaborator

asalzburger Feb 1, 2021
Maintainer Author