Improve DynamicSparseNumberFoo performance

The good news: we now have a working class that gives O(N_nonzero) instead of O(N) performance.  This could be a big deal for https://github.com/libantioch/antioch sensitivity applications where our eventual N/N_nonzero will be ~1000.

The bad news: @friedmud reports that the current implementation has a constant performance overhead of ~1000.  (based on profiling https://github.com/idaholab/moose/pull/5661, and according to the profiler the cost is mostly from new/delete).  He fixed the obvious flaws (pass-by-value to the sparsity operations) to little effect.

So let's brainstorm ideas.  I'll sort what I've got so far by increasing level of difficulty IMHO:

PBR:  Pass-by-reference in the sparsity operations (even if it's not a big win it's still better than nothing)

RI:  Use reverse iteration to do in-place sparsity operations rather than creating temporary merged_foo vectors.

RV:  Add C++11 rvalue operations so we can steal doomed input arguments' allocations where available.

CA:  Give a custom allocator to the underlying std::vector.

CC:  Replace std::vector with a custom container that keeps the first O(N_nonzero) elements on the stack and only hits the heap for larger cases.

ET:  Use expression templates to postpone evaluations until we can do more of them at once without creating so many intermediate temporaries.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve DynamicSparseNumberFoo performance #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve DynamicSparseNumberFoo performance #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions