-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUT based implicit triangulation #963
base: dev
Are you sure you want to change the base?
Conversation
Question, in the new implementation you have all the implementation in the header, so it can be inlined. Where as the ModifiedOldImplicitTriangulation/OldImplicitTriangulation it seems most of the implementations are in the cpp file which means that unless you are compiling with link time optimization it can't be inlined, Hence the comparison will be quite misleading. Could you make all the function inline in the headers and then redo the tests? |
Hi Peter, The question is what we do from here. Modifying the original code is cumbersome. I really suggest to build that separate inheritance structure. Your thoughts on this @julien-tierny? |
Thanks a lot to the both of you. @pierre-guillou |
Hello everyone, I saw a maybe 5% performance improvement with the I played a bit with Jonas' new implementation of Note that inlining all these methods may increase the compilation times (and the required memory). We might want to ensure that TTK is still buildable on a small laptop without a lot of RAM. Great work, Jonas! |
alright, thanks a lot for your input @pierre-guillou. |
also, @petersteneteg you mentioned compiler options for "link time optimization". |
Link Time Optimization or LTO can basically look at both cpp files and headers and select what to inline, I can also potentially be quite effective at de-virtualizing i.e. getting rid of the extra indirection that has to happen for virtual calls. So in principle it is great. In practice it is quite hard to do. Since the optimizer has to handle the whole program at once not just one cpp file. The complexity becomes huge, and with that compilation time and memory usage. There are approaches to make it faster and consume less memory, but that also looses some of the potential optimizations... Anyway I would not want to depend on LTO for performance. But one could investigate using it for release builds where one can use a big server to build and wait a while maybe. |
@JonasLukasczyk not really related to any perf, but some suggestions for potential refactorings https://godbolt.org/z/81cYbKW5M This is quite similar to the version I talked to you about last week, except that I also calculate all those offsets at compile time Alternating diagonals and periodic in 2 directions It also employs strong_types, for all the indices to make it very hard to use an index in the wrong place. i.e. a point index for an edge or some such. |
Thanks Peter, that looks very interesting! Here are some comments:
What are your thoughts everyone? |
hi guys, thanks a lot for your input to the discussion! my thoughts are that there are two distinct topics here.
while inlining or renaming functions is easy, the other tasks (tagged with "significant human resources") could easily take somewhere between a few months and a year to complete for an engineer (in my view). best, |
I agree with what you are saying Julien. The good news here is that due to the base code templatization the cell complex can be worked on separately from the existing triangulation. But as a first result lets integrate the LUT-based approach in the header of the ImplicitTriangulation. I just tried to do that but then I run into the following compile error:
@pierre-guillou I added you as a maintainer to my fork. Could you please push the inlined version here? |
@JonasLukasczyk your error is linked to the |
Thank you! I will run some tests. |
CellComplex: Some refactoring
Hi Julien,
this PR is NOT supposed to be merged in! Here we can discuss the performance gains of a new lookup-table-based implicit triangulation (called
NewImplicitTraingulation
in the source code). Note, this PR comes with thetest.pvsm
statefile in the root directory, which just creates a 1024^3 wavelet and then runs thettkHelloWorld
filter. Here are some timings on my machine:The timings already tell the story. The key point here is that replacing the functions in the old implicit triangulation with the LUT-based functions is not enough to get the performance of the "pure"
NewImplicitTriangulation
. I did some experiments by making theNewImplicitTriangulation
also inherit from a base class, but that had no impact on the performance (as long as overwritten functions are declaredfinal
). I guess theCRTP
architecture used in the OldImplicitTriangulation is to blame, but I can not tell for sure.If I could dream, we should move to a completely separate inheritance structure, lets call it
CellComplex
, and incrementally move triangulations over to there until we can do a complete replace. Since the base functions are already templatized and therefore only check if the object that was passed has the necessary functions, this effort can be a completely separate affair.I also added @petersteneteg to the discussion, he might have some insight on this this.