-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Bulk runtimes are good. Understanding where a benchmark is spending most of its time is better.
I've looked at a few free open source tools
cProfile(included with python) +snakevizfor getting hotspot profiles and visualizing the resultslineProfilerfor getting detailed line-by-line profilesviztracer- to get trace and hotspot profiles
It'd be good to establish some practices we'd like to adopt to make comparisons of performance results a bit easier to understand; part of this is determining what tools we'll add to our toolkit and how we'll use them.
TLDR
I recommend using viztracer for general profiling aimed at identifying hotspots and understanding execution dependencies in code. Trace profiling visualized with perfetto is quite nice. Perfetto allows for generation of hotspot profiles on-the-fly; the default "timeline" view of the code execution makes it easy to see call stack relationships and concurrency while also being able to get a feel for where the most time is spent during execution.
Profilers
cProfile + snakeviz
cProfile is a nice sampling profiler that is built-in to python. It does not require any instrumentation in software to use, simply do the following to get a profile
python -m cProfile -o output.prof /path/to/program.py
The problem with this is that a ton of other boilerplate gets captured in the profile, including import calls, etc. To get around this, you can enclose a section of the code you want to profile with cProfile calls to start and stop profiling, e.g.
import cProfile
pr = cProfile.Profile()
pr.enable()
pset.execute(
runtime=np.timedelta64(24, "h"),
dt=np.timedelta64(60, "s"),
pyfunc=AdvectionEE,
verbose_progress=verbose_progress,
)
pr.disable()
# Write the profile to file
pr.dump_stats('pset-execute.prof')
visualization of the hotspot profile can be done with snakeviz. See example in NERSC documentation
The main issue I have with this is that some python calls can run concurrently; additionally, deep call stacks can become quite confusing in the icicle or "sunburst" viewers in snakeviz.
lineprofiler
Lineprofiler is useful when you have narrowed down regions of code you want to focus in on to get wall-times of each line of code. It has a bit of overhead for execution and is best when focusing in on specifc regions of code. I suspect this will shine when trying to optimize hotspot kernels, but may not be beneficial for initial application profiling.
viztracer
viztracer is a nice sampling profiler that collects detailed trace profiles during code execution. It requires no code instrumentation and can be used simply by doing
viztracer /path/to/program.py
And profiles can be viewed with vizviewer. Alternatively, viztracer has a VS Code extension that correlates lines of code with the graphical representation of the trace profile directly in vscode. Under the hood, vizviewer uses Perfetto . By selecting regions of time in the trace view, you can quickly get hotspot profiles for select regions of time, which can really help us understand what kernels are occupying the most wall-time.