Skip to content

Commit

Permalink
Add benchmark
Browse files Browse the repository at this point in the history
From profiling data it seems like there isn't much we can do to optimize
the stack collapse sampler implementation. We could use Cython or wrap
some C/Rust, but it would open the pandora box of cross-compilation and
compatibility issues.

I think that for the moment this native Python implementation is good
enough, it takes on average ~3 μs to process a stack snapshot.

As for the impact on performances... `performance_impact.py` reports an
execution time with pylaprof lower than the native version.
The only explanation is that the impact of pylaprof is almost null and
that the lower execution times are caused by some arcane combination of
other factors.
  • Loading branch information
Giuseppe Lumia committed Nov 22, 2021
1 parent a4ecb98 commit 21f9f7e
Show file tree
Hide file tree
Showing 10 changed files with 298 additions and 3 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ dist
.coverage
htmlcov

# Benchmark stuff
frames.dump

# Node stuff
node_modules

Expand Down
7 changes: 5 additions & 2 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,15 @@
- Add linters (flake8, black, isort, ~~bandit~~, ~~safety~~) <--- OK!
- Add integration tests <--- OK!
- Write a decent readme - show what we can do with pylaprof :)
(note, add info about the performance impact and other considerations to run
pylaprof in production, e.g. time required for report's upload/storage)
- Set-up CI/CD (Github Actions, package upload on PyPI). <--- OK!

## Wave 4 - Let's get production ready
- Error handling and logging <--- OK!
- Tight loop optimization (maybe use Cython for that?)
- Measure and document performance impact
- ~~Tight loop optimization (maybe use Cython for that?)~~ <--- OK! (not
necessary)
- Measure performance impact <--- OK!

## Wave 5 - Consolidation
- Gather feedbacks and eventually introduce breaking changes
Expand Down
74 changes: 74 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
## Measure performance impact
`performance_impact.py` is a utility to benchmark the impact of pylaprof on example's
handler code.

```
./performance_impact.py --iterations 10 --period 0.01
```

You can easily adapt it do measure the impact of pylaprof on any other function.


## Profile and benchmark a sampler
Sampler's `sample` execution time imposes a lower bound to the real period of
profiler's sampling: it is useless to provide 10 μs (`0.00001`) as *period* parameter if
`sample` takes 100 μs to run (the actual sampling period would be 110 μs).

The smaller the period that is provided the more it matters how much our sampler
actually takes to process the return value of `sys._current_frames`.

`record_frames.py` and `process_frames.py` are two utility scripts to profile and
benchmark the `sample` implementation of `StackCollapse`.

To generate frames data that can later be fed to our sampler (this requires
example's dummy API server to be running):
```
./record_frames.py
```

To profile the `sample` implementation of `StackCollapse` using the generated data (we
could use pylaprof itself instead of py-spy for that but... let's avoid Inception):
```
py-spy record -r 1000 -i -- ./process_frames.py
```

And to benchmark it (replace 1000 with the number of iterations over the data
set you want, the more the better):
```
./process_frames.py --iterations 1000
```

Any other sampler implementation can be profiled and benchmarked in a similar way with
smaller changes to `process_frames.py`.

## Benchmark results
Results of benchmarks on a i7-1165G7 @ 2.80GHz with 16GB of LPDDR4 4267 MHz ram.

```
$ ./server.py 2>/dev/null &
[1] 15751
$ ./record_frames.py
$ ./process_frames.py --iterations 10000
Iterating 10000 times over 5266 frames.
Performance stats for `sample` method of class `StackCollapse`:
3.308478209440852e-06 +- 5.438494214420574e-07 seconds per call
$ ./performance_impact.py --iterations 10
Iterating 10 times over example's handler w/wo pylaprof.
Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})` (native):
9.006643843650817 +- 0.5760573076093592 seconds per execution
Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})` (pylaprof enabled with period 0.01):
8.722303414344788 +- 0.11405732794170521 seconds per execution
Performance impact of pylaprof:
-0.2843404293060292 +- 0.587240236791848 seconds per execution
```

The sampler takes ~3.3 μs to process a stack snapshot and the impact of pylaprof with a
sampling period of 0.01 on handler's performance is indistinguishable from noise of
other factors.

Note: in those benchmarks we use a dummy storer that ignores the report file generated
by the sampler, in a real scenario you need to take into account the time to store
report's data on the filesystem or upload it on S3.
1 change: 1 addition & 0 deletions benchmark/handler.py
95 changes: 95 additions & 0 deletions benchmark/performance_impact.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/usr/bin/env python
"""Benchmark pylaprof's impact on example's handler.
"""

import argparse
import math
import statistics
import sys
import time

from handler import handler

from pylaprof import Profiler, Storer


class Null(Storer):
"""Storer that doesn't do anything with the record file."""

def store(self, file):
pass


def main():
parser = argparse.ArgumentParser(sys.argv[0])
parser = argparse.ArgumentParser(
description="benchmark pylaprof's impact on example's handler"
)
parser.add_argument(
"--period",
metavar="PERIOD",
type=float,
default=0.01,
help="`period` parameter for pylaprof (default: 0.01)",
)
parser.add_argument(
"--iterations",
metavar="NUM",
type=int,
default=3,
help="number of executions (default: 3)",
)
opts = parser.parse_args(sys.argv[1:])

print("Iterating", opts.iterations, "times over example's handler w/wo pylaprof.\n")

durations_native = []
durations_pylaprof = []
for i in range(opts.iterations):
start = time.time()
with Profiler(period=opts.period, storer=Null()):
handler({"dummy": "event"}, {"dummy": "context"})
durations_pylaprof.append(time.time() - start)

start = time.time()
handler({"dummy": "event"}, {"dummy": "context"})
durations_native.append(time.time() - start)

print(
'Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})`',
"(native):\n\t",
statistics.mean(durations_native),
"+-",
statistics.stdev(durations_native),
"seconds per execution",
)

print(
'Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})`',
f"(pylaprof enabled with period {opts.period}):\n\t",
statistics.mean(durations_pylaprof),
"+-",
statistics.stdev(durations_pylaprof),
"seconds per execution",
)

# ~ Quick probability theory recap.
# Let X and Y be two independent and identically distributed random variables, then:
#
# mean(X-Y) = mean(X) + mean(Y)
# variance(X-Y) = variance(X) + variance(Y)
# stdev(X) = sqrt(variance(X))
print(
"\nPerformance impact of pylaprof:\n\t",
statistics.mean(durations_pylaprof) - statistics.mean(durations_native),
" +-",
math.sqrt(
statistics.variance(durations_pylaprof)
+ statistics.variance(durations_native)
),
"seconds per execution",
)


if __name__ == "__main__":
main()
55 changes: 55 additions & 0 deletions benchmark/process_frames.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env python

import argparse
import pickle
import statistics
import sys
import time

from record_frames import Code, Frame # noqa

from pylaprof import StackCollapse


def main():
parser = argparse.ArgumentParser(sys.argv[0])
parser = argparse.ArgumentParser(
description="benchmark sampler's processing of data generated by `record_frames.py`" # noqa
)
parser.add_argument(
"--iterations",
metavar="NUM",
type=int,
default=100,
help="number of iterations over frames data (default: 100)",
)
opts = parser.parse_args(sys.argv[1:])

sampler = StackCollapse()
with open("frames.dump", "rb") as fp:
frames = pickle.load(fp)

print("Iterating", opts.iterations, "times over", len(frames), "frames.\n")
durations = []
for i in range(opts.iterations):
start = time.time()
for frame in frames:
sampler.sample(frame)
durations.append(time.time() - start)

# Normalize by number of stack frames (in this way we get how much it took on
# average to process one single stack record)
durations = [d / len(frames) for d in durations]

print(
"Performance stats for `sample` method of class",
f"`{sampler.__class__.__name__}`:\n\t",
statistics.mean(durations),
"+-",
statistics.stdev(durations),
"seconds per call",
)


if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions benchmark/pylaprof
62 changes: 62 additions & 0 deletions benchmark/record_frames.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/usr/bin/env python
"""Script to dump frames of execution of example's handler.
We need to define a `Code` and `Frame` class as pickle isn't able to dump Python's
native `frame` and `code` classes.
"""

import pickle
from io import BytesIO

from handler import handler

from pylaprof import FS, Profiler, Sampler


class Code:
def __init__(self, co_filename, co_name):
self.co_filename = co_filename
self.co_name = co_name


class Frame:
def __init__(self, f_code, f_lineno, f_back):
self.f_code = f_code
self.f_lineno = f_lineno
self.f_back = f_back


class Raw(Sampler):
def __init__(self):
self.data = []

def sample(self, frame):
# Flatten frames into a list
frames = []
while frame:
frames.append(frame)
frame = frame.f_back

# Rebuild the stack with our custom `Frame` and `Code` instances
frames = frames[::-1]
for i in range(len(frames)):
frame = frames[i]
code = Code(frame.f_code.co_filename, frame.f_code.co_name)
frame = Frame(code, frame.f_lineno, None)
if i > 0:
frame.f_back = frames[i - 1]
frames[i] = frame

# Record custom stack
self.data.append(frames[-1])

def dump(self, storer):
file = BytesIO()
pickle.dump(self.data, file)
file.seek(0)
storer.store(file)


if __name__ == "__main__":
with Profiler(period=0.001, sampler=Raw(), storer=FS(path=lambda: "frames.dump")):
handler({"dummy": "event"}, {"dummy": "context"})
1 change: 1 addition & 0 deletions benchmark/server.py
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "pylaprof"
version = "0.3.5"
version = "0.4.0"
description = "A Python sampling profiler for AWS Lambda functions (and not only)."
authors = ["Giuseppe Lumia <[email protected]>"]
license = "MIT"
Expand Down

0 comments on commit 21f9f7e

Please sign in to comment.