Skip to content

Improve DynamicSparseNumberFoo performance #1

@roystgnr

Description

@roystgnr

The good news: we now have a working class that gives O(N_nonzero) instead of O(N) performance. This could be a big deal for https://github.com/libantioch/antioch sensitivity applications where our eventual N/N_nonzero will be ~1000.

The bad news: @friedmud reports that the current implementation has a constant performance overhead of ~1000. (based on profiling idaholab/moose#5661, and according to the profiler the cost is mostly from new/delete). He fixed the obvious flaws (pass-by-value to the sparsity operations) to little effect.

So let's brainstorm ideas. I'll sort what I've got so far by increasing level of difficulty IMHO:

PBR: Pass-by-reference in the sparsity operations (even if it's not a big win it's still better than nothing)

RI: Use reverse iteration to do in-place sparsity operations rather than creating temporary merged_foo vectors.

RV: Add C++11 rvalue operations so we can steal doomed input arguments' allocations where available.

CA: Give a custom allocator to the underlying std::vector.

CC: Replace std::vector with a custom container that keeps the first O(N_nonzero) elements on the stack and only hits the heap for larger cases.

ET: Use expression templates to postpone evaluations until we can do more of them at once without creating so many intermediate temporaries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions