-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
From profiling data it seems like there isn't much we can do to optimize the stack collapse sampler implementation. We could use Cython or wrap some C/Rust, but it would open the pandora box of cross-compilation and compatibility issues. I think that for the moment this native Python implementation is good enough, it takes on average ~3 μs to process a stack snapshot. As for the impact on performances... `performance_impact.py` reports an execution time with pylaprof lower than the native version. The only explanation is that the impact of pylaprof is almost null and that the lower execution times are caused by some arcane combination of other factors.
- Loading branch information
Giuseppe Lumia
committed
Nov 22, 2021
1 parent
a4ecb98
commit 21f9f7e
Showing
10 changed files
with
298 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,9 @@ dist | |
.coverage | ||
htmlcov | ||
|
||
# Benchmark stuff | ||
frames.dump | ||
|
||
# Node stuff | ||
node_modules | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
## Measure performance impact | ||
`performance_impact.py` is a utility to benchmark the impact of pylaprof on example's | ||
handler code. | ||
|
||
``` | ||
./performance_impact.py --iterations 10 --period 0.01 | ||
``` | ||
|
||
You can easily adapt it do measure the impact of pylaprof on any other function. | ||
|
||
|
||
## Profile and benchmark a sampler | ||
Sampler's `sample` execution time imposes a lower bound to the real period of | ||
profiler's sampling: it is useless to provide 10 μs (`0.00001`) as *period* parameter if | ||
`sample` takes 100 μs to run (the actual sampling period would be 110 μs). | ||
|
||
The smaller the period that is provided the more it matters how much our sampler | ||
actually takes to process the return value of `sys._current_frames`. | ||
|
||
`record_frames.py` and `process_frames.py` are two utility scripts to profile and | ||
benchmark the `sample` implementation of `StackCollapse`. | ||
|
||
To generate frames data that can later be fed to our sampler (this requires | ||
example's dummy API server to be running): | ||
``` | ||
./record_frames.py | ||
``` | ||
|
||
To profile the `sample` implementation of `StackCollapse` using the generated data (we | ||
could use pylaprof itself instead of py-spy for that but... let's avoid Inception): | ||
``` | ||
py-spy record -r 1000 -i -- ./process_frames.py | ||
``` | ||
|
||
And to benchmark it (replace 1000 with the number of iterations over the data | ||
set you want, the more the better): | ||
``` | ||
./process_frames.py --iterations 1000 | ||
``` | ||
|
||
Any other sampler implementation can be profiled and benchmarked in a similar way with | ||
smaller changes to `process_frames.py`. | ||
|
||
## Benchmark results | ||
Results of benchmarks on a i7-1165G7 @ 2.80GHz with 16GB of LPDDR4 4267 MHz ram. | ||
|
||
``` | ||
$ ./server.py 2>/dev/null & | ||
[1] 15751 | ||
$ ./record_frames.py | ||
$ ./process_frames.py --iterations 10000 | ||
Iterating 10000 times over 5266 frames. | ||
Performance stats for `sample` method of class `StackCollapse`: | ||
3.308478209440852e-06 +- 5.438494214420574e-07 seconds per call | ||
$ ./performance_impact.py --iterations 10 | ||
Iterating 10 times over example's handler w/wo pylaprof. | ||
Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})` (native): | ||
9.006643843650817 +- 0.5760573076093592 seconds per execution | ||
Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})` (pylaprof enabled with period 0.01): | ||
8.722303414344788 +- 0.11405732794170521 seconds per execution | ||
Performance impact of pylaprof: | ||
-0.2843404293060292 +- 0.587240236791848 seconds per execution | ||
``` | ||
|
||
The sampler takes ~3.3 μs to process a stack snapshot and the impact of pylaprof with a | ||
sampling period of 0.01 on handler's performance is indistinguishable from noise of | ||
other factors. | ||
|
||
Note: in those benchmarks we use a dummy storer that ignores the report file generated | ||
by the sampler, in a real scenario you need to take into account the time to store | ||
report's data on the filesystem or upload it on S3. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../examples/handler.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
#!/usr/bin/env python | ||
"""Benchmark pylaprof's impact on example's handler. | ||
""" | ||
|
||
import argparse | ||
import math | ||
import statistics | ||
import sys | ||
import time | ||
|
||
from handler import handler | ||
|
||
from pylaprof import Profiler, Storer | ||
|
||
|
||
class Null(Storer): | ||
"""Storer that doesn't do anything with the record file.""" | ||
|
||
def store(self, file): | ||
pass | ||
|
||
|
||
def main(): | ||
parser = argparse.ArgumentParser(sys.argv[0]) | ||
parser = argparse.ArgumentParser( | ||
description="benchmark pylaprof's impact on example's handler" | ||
) | ||
parser.add_argument( | ||
"--period", | ||
metavar="PERIOD", | ||
type=float, | ||
default=0.01, | ||
help="`period` parameter for pylaprof (default: 0.01)", | ||
) | ||
parser.add_argument( | ||
"--iterations", | ||
metavar="NUM", | ||
type=int, | ||
default=3, | ||
help="number of executions (default: 3)", | ||
) | ||
opts = parser.parse_args(sys.argv[1:]) | ||
|
||
print("Iterating", opts.iterations, "times over example's handler w/wo pylaprof.\n") | ||
|
||
durations_native = [] | ||
durations_pylaprof = [] | ||
for i in range(opts.iterations): | ||
start = time.time() | ||
with Profiler(period=opts.period, storer=Null()): | ||
handler({"dummy": "event"}, {"dummy": "context"}) | ||
durations_pylaprof.append(time.time() - start) | ||
|
||
start = time.time() | ||
handler({"dummy": "event"}, {"dummy": "context"}) | ||
durations_native.append(time.time() - start) | ||
|
||
print( | ||
'Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})`', | ||
"(native):\n\t", | ||
statistics.mean(durations_native), | ||
"+-", | ||
statistics.stdev(durations_native), | ||
"seconds per execution", | ||
) | ||
|
||
print( | ||
'Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})`', | ||
f"(pylaprof enabled with period {opts.period}):\n\t", | ||
statistics.mean(durations_pylaprof), | ||
"+-", | ||
statistics.stdev(durations_pylaprof), | ||
"seconds per execution", | ||
) | ||
|
||
# ~ Quick probability theory recap. | ||
# Let X and Y be two independent and identically distributed random variables, then: | ||
# | ||
# mean(X-Y) = mean(X) + mean(Y) | ||
# variance(X-Y) = variance(X) + variance(Y) | ||
# stdev(X) = sqrt(variance(X)) | ||
print( | ||
"\nPerformance impact of pylaprof:\n\t", | ||
statistics.mean(durations_pylaprof) - statistics.mean(durations_native), | ||
" +-", | ||
math.sqrt( | ||
statistics.variance(durations_pylaprof) | ||
+ statistics.variance(durations_native) | ||
), | ||
"seconds per execution", | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
#!/usr/bin/env python | ||
|
||
import argparse | ||
import pickle | ||
import statistics | ||
import sys | ||
import time | ||
|
||
from record_frames import Code, Frame # noqa | ||
|
||
from pylaprof import StackCollapse | ||
|
||
|
||
def main(): | ||
parser = argparse.ArgumentParser(sys.argv[0]) | ||
parser = argparse.ArgumentParser( | ||
description="benchmark sampler's processing of data generated by `record_frames.py`" # noqa | ||
) | ||
parser.add_argument( | ||
"--iterations", | ||
metavar="NUM", | ||
type=int, | ||
default=100, | ||
help="number of iterations over frames data (default: 100)", | ||
) | ||
opts = parser.parse_args(sys.argv[1:]) | ||
|
||
sampler = StackCollapse() | ||
with open("frames.dump", "rb") as fp: | ||
frames = pickle.load(fp) | ||
|
||
print("Iterating", opts.iterations, "times over", len(frames), "frames.\n") | ||
durations = [] | ||
for i in range(opts.iterations): | ||
start = time.time() | ||
for frame in frames: | ||
sampler.sample(frame) | ||
durations.append(time.time() - start) | ||
|
||
# Normalize by number of stack frames (in this way we get how much it took on | ||
# average to process one single stack record) | ||
durations = [d / len(frames) for d in durations] | ||
|
||
print( | ||
"Performance stats for `sample` method of class", | ||
f"`{sampler.__class__.__name__}`:\n\t", | ||
statistics.mean(durations), | ||
"+-", | ||
statistics.stdev(durations), | ||
"seconds per call", | ||
) | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../pylaprof/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
#!/usr/bin/env python | ||
"""Script to dump frames of execution of example's handler. | ||
We need to define a `Code` and `Frame` class as pickle isn't able to dump Python's | ||
native `frame` and `code` classes. | ||
""" | ||
|
||
import pickle | ||
from io import BytesIO | ||
|
||
from handler import handler | ||
|
||
from pylaprof import FS, Profiler, Sampler | ||
|
||
|
||
class Code: | ||
def __init__(self, co_filename, co_name): | ||
self.co_filename = co_filename | ||
self.co_name = co_name | ||
|
||
|
||
class Frame: | ||
def __init__(self, f_code, f_lineno, f_back): | ||
self.f_code = f_code | ||
self.f_lineno = f_lineno | ||
self.f_back = f_back | ||
|
||
|
||
class Raw(Sampler): | ||
def __init__(self): | ||
self.data = [] | ||
|
||
def sample(self, frame): | ||
# Flatten frames into a list | ||
frames = [] | ||
while frame: | ||
frames.append(frame) | ||
frame = frame.f_back | ||
|
||
# Rebuild the stack with our custom `Frame` and `Code` instances | ||
frames = frames[::-1] | ||
for i in range(len(frames)): | ||
frame = frames[i] | ||
code = Code(frame.f_code.co_filename, frame.f_code.co_name) | ||
frame = Frame(code, frame.f_lineno, None) | ||
if i > 0: | ||
frame.f_back = frames[i - 1] | ||
frames[i] = frame | ||
|
||
# Record custom stack | ||
self.data.append(frames[-1]) | ||
|
||
def dump(self, storer): | ||
file = BytesIO() | ||
pickle.dump(self.data, file) | ||
file.seek(0) | ||
storer.store(file) | ||
|
||
|
||
if __name__ == "__main__": | ||
with Profiler(period=0.001, sampler=Raw(), storer=FS(path=lambda: "frames.dump")): | ||
handler({"dummy": "event"}, {"dummy": "context"}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../examples/server.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[tool.poetry] | ||
name = "pylaprof" | ||
version = "0.3.5" | ||
version = "0.4.0" | ||
description = "A Python sampling profiler for AWS Lambda functions (and not only)." | ||
authors = ["Giuseppe Lumia <[email protected]>"] | ||
license = "MIT" | ||
|