Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 02209e0

Browse files
committedJun 23, 2025·
update
1 parent 65be082 commit 02209e0

File tree

1 file changed

+89
-17
lines changed

1 file changed

+89
-17
lines changed
 

‎docs/others/miscellaneous/osdi-talk.md

Lines changed: 89 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,41 +8,73 @@ Good morning! I'm **Yusheng Zheng** from UC Santa Cruz, presenting our OSDI '25
88

99
## [Slide 1] Extensions: A Concrete Example
1010

11+
> split to 2 slides
12+
1113
Extensions are everywhere in modern software. PostgreSQL has over 100 extensions for everything from geospatial data to time-series analytics. Nginx has evolved from a basic HTTP server into a versatile platform through its rich extension ecosystem. Emacs users install packages for language support and productivity tools. Vim has thousands of plugins for syntax highlighting and development workflows. Redis uses Lua scripts for custom data processing. Even browsers depend on extensions for ad blocking and developer tools.
1214

15+
> 2 is detail, the rest is just some images. too many example.
16+
17+
> maybe new slide starts from here, around nginx example
18+
1319
Let me start with a concrete example to show you what extensions are and why we need them. Consider Nginx deployed as a reverse proxy. The original Nginx developers write the core server functionality. But different deployments need different behaviors. Some need firewalls to block malicious requests. Others need load balancers to distribute traffic. Many need monitoring for observability. Extensions solve this problem by allowing customization without modifying the original application source code.
1420

21+
> add why that's good idea not to modify the original application source code? one sentence
22+
1523
Here's how the extension execution model works in Nginx. A developer defines new logic as Nginx modules or plugins, and associates each extension with specific locations in Nginx's request processing pipeline, called extension entries. When a user runs Nginx, the system loads both the core Nginx binary and the configured extensions. Each time an Nginx worker thread reaches an extension entry—like when processing an incoming HTTP request—the thread jumps to the associated extension. It executes the extension logic within Nginx's runtime context. Once the extension completes, the thread returns to Nginx's core processing at the point immediately after the extension entry.
1624

25+
> add a figture
26+
1727
## [Slide 2] Extension Problems and Requirements
1828

1929
However, Nginx extension systems face serious safety and performance challenges. Real-world incidents show the risks: In 2023, a malformed Lua plugin created an infinite loop inside Nginx, causing Bilibili's entire CDN to go down for hours. Apache's Lua module also suffered buffer overflows that crashed httpd. Redis scripts can enable remote code execution through stack overflows. These examples show that even mature plugin ecosystems can bring production services to a halt.
2030

31+
> shorter and around nginx, and add not just, like redis.
32+
33+
> add for example, many people using lua and wasm to .... that's too much to pay ...
34+
2135
Meanwhile, Nginx operators often disable WebAssembly or Lua-based extensions in production due to their persistent 10–15 percent throughput penalty on HTTP request processing. This creates a painful tension between safety, extensibility, and performance.
2236

37+
> put it in seperate slide.
38+
2339
Nginx extension frameworks need three key features. First, fine-grained safety and interconnectedness trade-offs. Nginx extensions must interact with the web server by reading request headers and calling HTTP processing functions, but managers need to follow the principle of least privilege, granting only necessary permissions per extension. Second, isolation to protect Nginx extensions from core server bugs and vice versa. Third, efficiency with near-native speed execution, since Nginx extensions often run on critical paths like per-request processing where every millisecond matters for user experience.
2440

41+
> use the system diagram here with annotations to show the requirements and bugs. the first one is system diagram + issues, the second is system diagram + requirements.
42+
2543
## [Slide 3] State-of-the-Art Falls Short
2644

2745
Unfortunately, existing approaches cannot satisfy all requirements simultaneously. Dynamic loading achieves speed but provides no isolation or policies. Software Fault Isolation systems like WebAssembly deliver safety but carry 10–15 percent performance penalties. Subprocess isolation ensures separation but has untenable IPC overhead. Kernel eBPF uprobes offer isolation but trap into the kernel on every invocation, costing microseconds each time.
2846

47+
> bullet points and animation will make people easy to follow.
48+
2949
## [Slide 4] Contribution: EIM + bpftime
3050

51+
> reuse it as outline, tell people what you are talking about
52+
3153
We present a two-part solution. First, the Extension Interface Model (EIM) treats every extension capability as a named resource. We split the work into development time, where application developers declare possible capabilities, and deployment time, where extension managers choose minimal privilege sets following least privilege principles.
3254

33-
Second, bpftime is a new runtime that efficiently enforces EIM using three key techniques: offline eBPF verification for zero runtime safety checks, Intel Memory Protection Keys for fast domain switching, and concealed extension entries that eliminate overhead for unused hooks. Together, they provide kernel-grade safety with library-grade performance while maintaining 100% eBPF compatibility.
55+
Second, bpftime is a new runtime that efficiently enforces EIM using three key techniques: offline eBPF verification for zero runtime safety checks, Intel Memory Protection Keys for fast domain switching, and concealed extension entries that eliminate overhead for unused hooks. Together, they provide kernel-grade safety with library-grade performance while maintaining eBPF compatibility.
56+
57+
> add a evaluation sentence here.
58+
59+
> add some visualization/image
3460
3561
## [Slide 5] EIM: Extension Interface Model
3662

37-
To enable fine-grained control, we introduce the Extension Interface Model, or EIM. EIM treats extension capabilities as named resources with a two-phase specification approach.
63+
To enable fine-grained safety-interconnectness trade-offs, we introduce the Extension Interface Model, or EIM. EIM treats extension capabilities as named resources with a two-phase specification approach.
3864

3965
Let me explain this using our Nginx example. In the extension ecosystem, we have four key roles. First, Nginx application developers write the core web server code. Second, extension developers create plugins like firewalls, load balancers, and monitoring tools. Third, the extension manager—typically a system administrator or DevOps engineer—decides which extensions to deploy and what privileges each should have. Finally, end users send HTTP requests that trigger both the host application and extensions.
4066

41-
EIM captures this separation of concerns through capabilities as resources. State access capabilities control reading and writing variables like request headers or connection counts. Function call capabilities govern invoking Nginx APIs like `nginx_time()` or `ngx_http_finalize_request()`, complete with pre- and post-conditions. Hardware resource capabilities limit CPU instructions and memory access patterns.
67+
EIM captures this separation of concerns through capabilities as resources.
4268

43-
The key insight is splitting specification into two phases. During development time, Nginx developers annotate their code to declare the universe of possible extension behaviors—what state could be accessed, which functions could be called, where extensions could hook. This creates a comprehensive capability manifest embedded in the binary.
4469

45-
At deployment time, the extension manager writes policies that grant minimal privilege sets to specific extensions. A monitoring extension might only read request data and call logging functions. A firewall extension needs both read and write access to modify responses. A load balancer requires network capabilities to contact upstream servers.
70+
<!-- State access capabilities control reading and writing variables like request headers or connection counts. Function call capabilities govern invoking Nginx APIs like `nginx_time()` or `ngx_http_finalize_request()`, complete with pre- and post-conditions. Hardware resource capabilities limit CPU instructions and memory access patterns. -->
71+
> cover them later
72+
73+
The key insight is splitting specification into two phases. During development time, Nginx developers annotate their code to declare the universe of possible extension behaviors—what state could be accessed, which functions could be called, where extensions could hook.
74+
75+
> make it a little shorter and not too much detail.
76+
77+
At deployment time, the extension manager writes policies that grant minimal privilege sets to specific extensions.
4678

4779
This separation means managers can refine security policies in production without touching application source code, enabling true least-privilege extension deployment.
4880

@@ -52,35 +84,68 @@ Now let me show you how EIM works in practice. During development time, Nginx de
5284

5385
These annotations are automatically extracted and compiled into the binary. This happens once during development and creates a complete map of what extensions could ever access. The key insight is that developers only declare possibilities—they don't decide what actually gets used.
5486

87+
> need to modify the image, maybe add something to nginx system diagram to show what the developer can annotate and can do. maybe not full image, just some annotations.
88+
5589
## [Slide 7] EIM Deployment-Time Specification
5690

57-
Here's where the magic happens. At deployment time, the system administrator writes simple policies that grant minimal privileges to each extension. An observability extension might only read request data and call logging functions. A firewall extension gets both read and write access to modify responses. A load balancer needs network capabilities to contact upstream servers.
91+
At deployment time, the system administrator writes simple policies that grant minimal privileges to each extension.
5892

59-
The beauty is that these policies live completely outside the application code. You can refine security settings in production without recompiling anything. This separation enables true least-privilege deployment while keeping the original application unchanged.
93+
An observability extension might only read request data and call logging functions. A firewall extension gets both read and write access to modify responses. A load balancer needs network capabilities to contact upstream servers.
94+
95+
> like 2 different extension entry example spec, we can show them.
96+
97+
These policies live completely outside the application code. You can refine security settings in production without recompiling anything. This separation enables true least-privilege deployment while keeping the original application unchanged.
98+
99+
> change figture. similar to the previous one, from system diagram add more.
60100
61101
## [Slide 8] EIM Summary
62102

63103
To summarize EIM, we've solved the fine-grained control problem through two innovations. First, we model every extension capability as a named resource that can be precisely granted or denied. Second, we separate development time concerns from deployment time policies.
64104

65-
This fills a critical gap. Existing frameworks either give you no control at all, or they bundle everything into coarse-grained categories. EIM lets you say "this monitoring extension can only read request headers and call logging functions" while "this firewall can read and modify response content"—all without changing a single line of application code.
105+
Existing frameworks either give you no control at all, or they bundle everything into coarse-grained categories. EIM lets you say "this monitoring extension can only read request headers and call logging functions" while "this firewall can read and modify response content"—all without changing a single line of application code.
66106

67-
The key breakthrough is treating safety and interconnectedness as independent dimensions that can be balanced precisely for each use case.
107+
The key idea is treating safety and interconnectedness as independent dimensions that can be balanced precisely for each use case.
108+
109+
> maybe we don't need this slide. We can show the contribution again. we can use hightlight in contribution to replace empty title slide, se show it multiple times across the slides to guide people.
68110
69111
## [Slide 9] bpftime: Why We Need a New Runtime
70112

113+
> introduce the idea of bpftime
114+
71115
Now you might ask, "Can't we just use existing frameworks to enforce EIM policies?" Unfortunately, no. Current frameworks make painful trade-offs that prevent efficient EIM enforcement. Software fault isolation like WebAssembly adds 10-15% runtime overhead. Subprocess isolation requires expensive context switches. Kernel eBPF uprobes trap into the kernel on every single function call.
72116

117+
> "we talk about previous work..."
118+
119+
bpftime is a userspace extension framework in eBPF​...
120+
73121
We built bpftime specifically to enforce EIM efficiently while maintaining complete eBPF compatibility. This compatibility is crucial—it means existing eBPF tools work immediately with bpftime, and extensions can share data with kernel eBPF programs for comprehensive monitoring that spans both kernel and userspace.
74122

75-
## [Slide 10] bpftime Architecture
123+
> maybe change , shorter and just say compatibilit and work with kernel ebpf.
124+
125+
> maybe shorter a little bit.
126+
127+
> 1. compatibility ebpf (verification for safety + ecosystem)
128+
> 2. binary rewriting (conceal extension entry)
129+
> 3. isolation (mpk)
76130
77-
Here's how bpftime works at a high level. We intercept eBPF system calls before they reach the kernel. Our loader converts EIM policies into bytecode assertions and feeds everything through the kernel's proven eBPF verifier for safety guarantees. After JIT compilation to native code, we use binary rewriting to patch trampolines into the target application only when extensions are actually loaded. At runtime, we flip memory protection keys to switch security domains and execute the native extension code directly.
131+
## [Slide 10] bpftime Overview
132+
133+
Here's how bpftime works at a high level.
134+
<!-- We intercept eBPF system calls before they reach the kernel. Our loader converts EIM policies into bytecode assertions and feeds everything through the kernel's proven eBPF verifier for safety guarantees. After JIT compilation to native code, we use binary rewriting to patch trampolines into the target application only when extensions are actually loaded. At runtime, we flip memory protection keys to switch security domains and execute the native extension code directly. -->
135+
136+
> "to ensure compatibility, we need to do something like this..."
137+
> " to ensure ..."
138+
> " we convert the eim into..."
139+
> " this enable us to resure the verifier..."
140+
> each things you introduce match previous slide.
78141
79142
The key insight is reusing the existing eBPF ecosystem while adding just the minimal components needed for userspace deployment with EIM enforcement.
80143

144+
> we need to simplify this diagram. only the necessary parts.
145+
81146
## [Slide 11] bpftime: Key Challenges and Design
82147

83-
Now you might ask, "Can't we just expand existing extension frameworks to enforce EIM policies?" Unfortunately, that approach won't work. Existing frameworks provide safety and isolation through heavyweight operating-system isolation or software-fault isolation techniques like WebAssembly. These are already inefficient, imposing 10-15% overhead. Adding EIM enforcement on top would degrade their performance even further, making them unsuitable for production use.
148+
> no need this one, but introduce each in the overview with the figture.
84149
85150
So we designed bpftime as a new extension framework specifically for compiled applications. But ensuring eBPF compatibility presented a major challenge. The Linux eBPF ecosystem consists of tightly coupled components—compilers, runtime libraries, and the kernel—that are nearly impossible to disentangle. Prior user-level eBPF systems tried re-implementing the entire eBPF technology stack and ultimately failed to provide reasonable performance and compatibility.
86151

@@ -92,28 +157,35 @@ bpftime employs two key design constraints that work together. First, we use sep
92157

93158
To prove our approach works, we built six real-world applications. For security, we created an Nginx firewall that blocks malicious URLs in real time. For reliability, we built a Redis extension that bridges the durability gap between losing thousands of writes versus taking a 6× performance hit. For performance, we accelerated FUSE file operations with in-process caching. For observability, we ported existing tools like DeepFlow, syscount, and sslsniff to demonstrate seamless eBPF compatibility.
94159

160+
> about the oss part, the code is opensource since... and we have community and suers... the things are done by and we pick ...
161+
95162
## [Slide 13] Performance Results: Nginx Firewall
96163

97-
Let me show you the performance impact. For our Nginx firewall, we compared different extension approaches under a realistic workload. Lua and WebAssembly extensions impose 11–12 percent throughput loss—that's significant overhead that many operators can't accept in production. Our bpftime implementation achieves the same security functionality with only 2 percent overhead. That's a 5× to 6× improvement over existing approaches.
164+
Let me show you the performance impact. For our Nginx firewall, we compared different extension approaches under a realistic workload. Lua and WebAssembly extensions impose 11–12 percent throughput loss—that's significant overhead that many operators can't accept in production. Our bpftime implementation achieves the same security functionality with only 2 percent overhead. That's a 5× to 6× improvement over existing approaches.
98165

99-
This result matters because it crosses the threshold where extension overhead becomes acceptable in production environments.
166+
> "in this diagram, the more to the right, the better."
100167
101168
## [Slide 14] Performance Results: SSL Monitoring
102169

103170
For observability, consider sslsniff, which monitors encrypted TLS traffic—crucial for debugging production microservices. With kernel eBPF, this monitoring costs 28 percent throughput loss. That's prohibitive for production use. With bpftime, the same monitoring functionality costs only 7 percent overhead.
104171

105-
This improvement makes encrypted traffic monitoring practical in performance-sensitive environments where every percentage point of overhead matters.
172+
> say in words and describe more about the figure.
173+
106174

107175
## [Slide 15] Take-Aways and Future Work
108176

109-
Let me close with three key takeaways. First, EIM provides the missing piece for fine-grained extension control—you can now specify precise least-privilege policies per extension entry without touching application source code. Second, bpftime shows that you don't have to choose between safety and performance. We achieve kernel-grade safety with library-grade performance using offline verification, hardware isolation, and concealed trampolines. Third, maintaining 100% eBPF compatibility means you can adopt our approach immediately without changing your existing workflows.
177+
> use the contribution slide again, to say summary....
110178
179+
Let me close with three key takeaways. First, EIM provides the missing piece for fine-grained extension control—you can now specify precise least-privilege policies per extension entry without touching application source code. Second, bpftime shows that you don't have to choose between safety and performance. We achieve kernel-grade safety with library-grade performance using offline verification, hardware isolation, and concealed trampolines. Third, maintaining 100% eBPF compatibility means you can adopt our approach immediately without changing your existing workflows.
180+
<!--
111181
Looking ahead, we're expanding bpftime to support GPU and ML workloads, broadening the scope of safe, efficient extension deployment beyond traditional systems programming.
112182
113-
However, current bpftime and EIM still have some limitations. First, EIM tools and policies are mainly for compiled applications, and we are working on supporting more languages. Also, you need to write the extension code in eBPF, which is not easy for some users.
183+
However, current bpftime and EIM still have some limitations. First, EIM tools and policies are mainly for compiled applications, and we are working on supporting more languages. Also, you need to write the extension code in eBPF, which is not easy for some users. -->
114184

115185
## [Slide 16] Thank You & Questions
116186

187+
> not a seperate slide, merge it into contribution outline...
188+
117189
Thank you for your attention. **bpftime** is open-source under the MIT license at **github.com/eunomia-bpf/bpftime**. You can get started today by running it as a drop-in replacement for eBPF applications. We welcome your issues, pull requests, and collaboration. I'm happy to take your questions.
118190

119191
## **Complete Slide Deck (16 slides, 16:9)**

0 commit comments

Comments
 (0)
Please sign in to comment.