Skip to content

[Discussion] Getting profiles of benchmark runs #3

@fluidnumerics-joe

Description

@fluidnumerics-joe

Summary

Bulk runtimes are good. Understanding where a benchmark is spending most of its time is better.
I've looked at a few free open source tools

  • cProfile (included with python) + snakeviz for getting hotspot profiles and visualizing the results
  • lineProfiler for getting detailed line-by-line profiles
  • viztracer - to get trace and hotspot profiles

It'd be good to establish some practices we'd like to adopt to make comparisons of performance results a bit easier to understand; part of this is determining what tools we'll add to our toolkit and how we'll use them.

TLDR

I recommend using viztracer for general profiling aimed at identifying hotspots and understanding execution dependencies in code. Trace profiling visualized with perfetto is quite nice. Perfetto allows for generation of hotspot profiles on-the-fly; the default "timeline" view of the code execution makes it easy to see call stack relationships and concurrency while also being able to get a feel for where the most time is spent during execution.

Profilers

cProfile + snakeviz

cProfile is a nice sampling profiler that is built-in to python. It does not require any instrumentation in software to use, simply do the following to get a profile

python -m cProfile -o output.prof /path/to/program.py

The problem with this is that a ton of other boilerplate gets captured in the profile, including import calls, etc. To get around this, you can enclose a section of the code you want to profile with cProfile calls to start and stop profiling, e.g.

    import cProfile
    pr = cProfile.Profile()
    pr.enable()
    pset.execute(
        runtime=np.timedelta64(24, "h"),
        dt=np.timedelta64(60, "s"),
        pyfunc=AdvectionEE,
        verbose_progress=verbose_progress,
    )
    pr.disable()
    # Write the profile to file
    pr.dump_stats('pset-execute.prof')

visualization of the hotspot profile can be done with snakeviz. See example in NERSC documentation

The main issue I have with this is that some python calls can run concurrently; additionally, deep call stacks can become quite confusing in the icicle or "sunburst" viewers in snakeviz.

lineprofiler

Lineprofiler is useful when you have narrowed down regions of code you want to focus in on to get wall-times of each line of code. It has a bit of overhead for execution and is best when focusing in on specifc regions of code. I suspect this will shine when trying to optimize hotspot kernels, but may not be beneficial for initial application profiling.

viztracer

viztracer is a nice sampling profiler that collects detailed trace profiles during code execution. It requires no code instrumentation and can be used simply by doing

viztracer /path/to/program.py

And profiles can be viewed with vizviewer. Alternatively, viztracer has a VS Code extension that correlates lines of code with the graphical representation of the trace profile directly in vscode. Under the hood, vizviewer uses Perfetto . By selecting regions of time in the trace view, you can quickly get hotspot profiles for select regions of time, which can really help us understand what kernels are occupying the most wall-time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions