Add benchmark

From profiling data it seems like there isn't much we can do to optimize the stack collapse sampler implementation. We could use Cython or wrap some C/Rust, but it would open the pandora box of cross-compilation and compatibility issues. I think that for the moment this native Python implementation is good enough, it takes on average ~3 μs to process a stack snapshot. As for the impact on performances... `performance_impact.py` reports an execution time with pylaprof lower than the native version. The only explanation is that the impact of pylaprof is almost null and that the lower execution times are caused by some arcane combination of other factors.
glumia · Nov 22, 2021 · 21f9f7e · 21f9f7e
1 parent a4ecb98
commit 21f9f7e
Show file tree

Hide file tree

Showing 10 changed files with 298 additions and 3 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,9 @@ dist
 .coverage
 htmlcov
 
+# Benchmark stuff
+frames.dump
+
 # Node stuff
 node_modules
 

diff --git a/TODO.md b/TODO.md
@@ -20,12 +20,15 @@
 - Add linters (flake8, black, isort, ~~bandit~~, ~~safety~~) <--- OK!
 - Add integration tests <--- OK!
 - Write a decent readme - show what we can do with pylaprof :)
+  (note, add info about the performance impact and other considerations to run
+  pylaprof in production, e.g. time required for report's upload/storage)
 - Set-up CI/CD (Github Actions, package upload on PyPI). <--- OK!
 
 ## Wave 4 - Let's get production ready
 - Error handling and logging <--- OK!
-- Tight loop optimization (maybe use Cython for that?)
-- Measure and document performance impact
+- ~~Tight loop optimization (maybe use Cython for that?)~~ <--- OK! (not
+  necessary)
+- Measure performance impact <--- OK!
 
 ## Wave 5 - Consolidation
 - Gather feedbacks and eventually introduce breaking changes

diff --git a/benchmark/README.md b/benchmark/README.md
@@ -0,0 +1,74 @@
+## Measure performance impact
+`performance_impact.py` is a utility to benchmark the impact of pylaprof on example's
+handler code.
+
+```
+./performance_impact.py --iterations 10 --period 0.01
+```
+
+You can easily adapt it do measure the impact of pylaprof on any other function.
+
+
+## Profile and benchmark a sampler
+Sampler's `sample` execution time imposes a lower bound to the real period of
+profiler's sampling: it is useless to provide 10 μs (`0.00001`) as *period* parameter if
+`sample` takes 100 μs to run (the actual sampling period would be 110 μs).
+
+The smaller the period that is provided the more it matters how much our sampler
+actually takes to process the return value of `sys._current_frames`.
+
+`record_frames.py` and `process_frames.py` are two utility scripts to profile and
+benchmark the `sample` implementation of `StackCollapse`.
+
+To generate frames data that can later be fed to our sampler (this requires
+example's dummy API server to be running):
+```
+./record_frames.py
+```
+
+To profile the `sample` implementation of `StackCollapse` using the generated data (we
+could use pylaprof itself instead of py-spy for that but... let's avoid Inception):
+```
+py-spy record -r 1000 -i -- ./process_frames.py
+```
+
+And to benchmark it (replace 1000 with the number of iterations over the data
+set you want, the more the better):
+```
+./process_frames.py --iterations 1000
+```
+
+Any other sampler implementation can be profiled and benchmarked in a similar way with
+smaller changes to `process_frames.py`.
+
+## Benchmark results
+Results of benchmarks on a i7-1165G7 @ 2.80GHz with 16GB of LPDDR4 4267 MHz ram.
+
+```
+$ ./server.py 2>/dev/null &
+[1] 15751
+$ ./record_frames.py
+$ ./process_frames.py --iterations 10000
+Iterating 10000 times over 5266 frames.
+
+Performance stats for `sample` method of class `StackCollapse`:
+         3.308478209440852e-06 +- 5.438494214420574e-07 seconds per call
+$ ./performance_impact.py --iterations 10
+Iterating 10 times over example's handler w/wo pylaprof.
+
+Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})` (native):
+         9.006643843650817 +- 0.5760573076093592 seconds per execution
+Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})` (pylaprof enabled with period 0.01):
+         8.722303414344788 +- 0.11405732794170521 seconds per execution
+
+Performance impact of pylaprof:
+         -0.2843404293060292  +- 0.587240236791848 seconds per execution
+```
+
+The sampler takes ~3.3 μs to process a stack snapshot and the impact of pylaprof with a
+sampling period of 0.01 on handler's performance is indistinguishable from noise of
+other factors.
+
+Note: in those benchmarks we use a dummy storer that ignores the report file generated
+by the sampler, in a real scenario you need to take into account the time to store
+report's data on the filesystem or upload it on S3.
diff --git a/benchmark/handler.py b/benchmark/handler.py
@@ -0,0 +1 @@
+../examples/handler.py
diff --git a/benchmark/performance_impact.py b/benchmark/performance_impact.py
@@ -0,0 +1,95 @@
+#!/usr/bin/env python
+"""Benchmark pylaprof's impact on example's handler.
+"""
+
+import argparse
+import math
+import statistics
+import sys
+import time
+
+from handler import handler
+
+from pylaprof import Profiler, Storer
+
+
+class Null(Storer):
+    """Storer that doesn't do anything with the record file."""
+
+    def store(self, file):
+        pass
+
+
+def main():
+    parser = argparse.ArgumentParser(sys.argv[0])
+    parser = argparse.ArgumentParser(
+        description="benchmark pylaprof's impact on example's handler"
+    )
+    parser.add_argument(
+        "--period",
+        metavar="PERIOD",
+        type=float,
+        default=0.01,
+        help="`period` parameter for pylaprof (default: 0.01)",
+    )
+    parser.add_argument(
+        "--iterations",
+        metavar="NUM",
+        type=int,
+        default=3,
+        help="number of executions (default: 3)",
+    )
+    opts = parser.parse_args(sys.argv[1:])
+
+    print("Iterating", opts.iterations, "times over example's handler w/wo pylaprof.\n")
+
+    durations_native = []
+    durations_pylaprof = []
+    for i in range(opts.iterations):
+        start = time.time()
+        with Profiler(period=opts.period, storer=Null()):
+            handler({"dummy": "event"}, {"dummy": "context"})
+        durations_pylaprof.append(time.time() - start)
+
+        start = time.time()
+        handler({"dummy": "event"}, {"dummy": "context"})
+        durations_native.append(time.time() - start)
+
+    print(
+        'Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})`',
+        "(native):\n\t",
+        statistics.mean(durations_native),
+        "+-",
+        statistics.stdev(durations_native),
+        "seconds per execution",
+    )
+
+    print(
+        'Performance stats for `handler({"dummy": "event"}, {"dummy": "context"})`',
+        f"(pylaprof enabled with period {opts.period}):\n\t",
+        statistics.mean(durations_pylaprof),
+        "+-",
+        statistics.stdev(durations_pylaprof),
+        "seconds per execution",
+    )
+
+    # ~ Quick probability theory recap.
+    # Let X and Y be two independent and identically distributed random variables, then:
+    #
+    # mean(X-Y) = mean(X) + mean(Y)
+    # variance(X-Y) = variance(X) + variance(Y)
+    # stdev(X) = sqrt(variance(X))
+    print(
+        "\nPerformance impact of pylaprof:\n\t",
+        statistics.mean(durations_pylaprof) - statistics.mean(durations_native),
+        " +-",
+        math.sqrt(
+            statistics.variance(durations_pylaprof)
+            + statistics.variance(durations_native)
+        ),
+        "seconds per execution",
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/benchmark/process_frames.py b/benchmark/process_frames.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python
+
+import argparse
+import pickle
+import statistics
+import sys
+import time
+
+from record_frames import Code, Frame  # noqa
+
+from pylaprof import StackCollapse
+
+
+def main():
+    parser = argparse.ArgumentParser(sys.argv[0])
+    parser = argparse.ArgumentParser(
+        description="benchmark sampler's processing of data generated by `record_frames.py`"  # noqa
+    )
+    parser.add_argument(
+        "--iterations",
+        metavar="NUM",
+        type=int,
+        default=100,
+        help="number of iterations over frames data (default: 100)",
+    )
+    opts = parser.parse_args(sys.argv[1:])
+
+    sampler = StackCollapse()
+    with open("frames.dump", "rb") as fp:
+        frames = pickle.load(fp)
+
+    print("Iterating", opts.iterations, "times over", len(frames), "frames.\n")
+    durations = []
+    for i in range(opts.iterations):
+        start = time.time()
+        for frame in frames:
+            sampler.sample(frame)
+        durations.append(time.time() - start)
+
+    # Normalize by number of stack frames (in this way we get how much it took on
+    # average to process one single stack record)
+    durations = [d / len(frames) for d in durations]
+
+    print(
+        "Performance stats for `sample` method of class",
+        f"`{sampler.__class__.__name__}`:\n\t",
+        statistics.mean(durations),
+        "+-",
+        statistics.stdev(durations),
+        "seconds per call",
+    )
+
+
+if __name__ == "__main__":
+    main()
diff --git a/benchmark/pylaprof b/benchmark/pylaprof
@@ -0,0 +1 @@
+../pylaprof/
diff --git a/benchmark/record_frames.py b/benchmark/record_frames.py
@@ -0,0 +1,62 @@
+#!/usr/bin/env python
+"""Script to dump frames of execution of example's handler.
+
+We need to define a `Code` and `Frame` class as pickle isn't able to dump Python's
+native `frame` and `code` classes.
+"""
+
+import pickle
+from io import BytesIO
+
+from handler import handler
+
+from pylaprof import FS, Profiler, Sampler
+
+
+class Code:
+    def __init__(self, co_filename, co_name):
+        self.co_filename = co_filename
+        self.co_name = co_name
+
+
+class Frame:
+    def __init__(self, f_code, f_lineno, f_back):
+        self.f_code = f_code
+        self.f_lineno = f_lineno
+        self.f_back = f_back
+
+
+class Raw(Sampler):
+    def __init__(self):
+        self.data = []
+
+    def sample(self, frame):
+        # Flatten frames into a list
+        frames = []
+        while frame:
+            frames.append(frame)
+            frame = frame.f_back
+
+        # Rebuild the stack with our custom `Frame` and `Code` instances
+        frames = frames[::-1]
+        for i in range(len(frames)):
+            frame = frames[i]
+            code = Code(frame.f_code.co_filename, frame.f_code.co_name)
+            frame = Frame(code, frame.f_lineno, None)
+            if i > 0:
+                frame.f_back = frames[i - 1]
+            frames[i] = frame
+
+        # Record custom stack
+        self.data.append(frames[-1])
+
+    def dump(self, storer):
+        file = BytesIO()
+        pickle.dump(self.data, file)
+        file.seek(0)
+        storer.store(file)
+
+
+if __name__ == "__main__":
+    with Profiler(period=0.001, sampler=Raw(), storer=FS(path=lambda: "frames.dump")):
+        handler({"dummy": "event"}, {"dummy": "context"})
diff --git a/benchmark/server.py b/benchmark/server.py
@@ -0,0 +1 @@
+../examples/server.py
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "pylaprof"
-version = "0.3.5"
+version = "0.4.0"
 description = "A Python sampling profiler for AWS Lambda functions (and not only)."
 authors = ["Giuseppe Lumia <[email protected]>"]
 license = "MIT"