-
Notifications
You must be signed in to change notification settings - Fork 15
Description
The good news: we now have a working class that gives O(N_nonzero) instead of O(N) performance. This could be a big deal for https://github.com/libantioch/antioch sensitivity applications where our eventual N/N_nonzero will be ~1000.
The bad news: @friedmud reports that the current implementation has a constant performance overhead of ~1000. (based on profiling idaholab/moose#5661, and according to the profiler the cost is mostly from new/delete). He fixed the obvious flaws (pass-by-value to the sparsity operations) to little effect.
So let's brainstorm ideas. I'll sort what I've got so far by increasing level of difficulty IMHO:
PBR: Pass-by-reference in the sparsity operations (even if it's not a big win it's still better than nothing)
RI: Use reverse iteration to do in-place sparsity operations rather than creating temporary merged_foo vectors.
RV: Add C++11 rvalue operations so we can steal doomed input arguments' allocations where available.
CA: Give a custom allocator to the underlying std::vector.
CC: Replace std::vector with a custom container that keeps the first O(N_nonzero) elements on the stack and only hits the heap for larger cases.
ET: Use expression templates to postpone evaluations until we can do more of them at once without creating so many intermediate temporaries.