Skip to content

Commit f9612c8

Browse files
author
Wu Yu Wei
committed
Refactor structure
1 parent 23961c9 commit f9612c8

20 files changed

+355
-44
lines changed

Cargo.toml

+10-9
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,28 @@
11
[package]
2-
name = "shifgrethor"
2+
name = "elise"
33
version = "0.1.0"
4-
edition = "2018"
5-
authors = ["Without Boats <[email protected]>"]
4+
edition = "2021"
5+
description = "A concurrent GC."
6+
license = "Apache-2.0 OR MIT"
67

78
[dependencies]
89
pin-cell = "0.1.1"
910

1011
[dependencies.derive]
11-
path = "src/lib/derive"
12+
path = "crates/derive"
1213
version = "0.1.0"
13-
package = "shifgrethor-derive"
14+
package = "elise-derive"
1415

1516
[dependencies.gc]
16-
path = "src/lib/gc"
17+
path = "crates/gc"
1718
version = "0.1.0"
18-
package = "shifgrethor-gc"
19+
package = "elise-gc"
1920

2021
[dev-dependencies]
2122
env_logger = "0.5.13"
2223

2324
[workspace]
2425
members = [
25-
"src/lib/derive",
26-
"src/lib/gc",
26+
"crates/derive",
27+
"crates/gc",
2728
]

README.md

+309-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,309 @@
1-
A very unsafe concurrent GC.
1+
# Für Elise
2+
3+
## What is `elise`?
4+
5+
Für Elise (short for Elise) is a concurrent garbage collection crate based on
6+
shifgrethor.
7+
8+
## What kind of access does `elise` provide to data?
9+
10+
Some previous garbage collector APIs resolve some of the safety issues with
11+
garbage collection by only allowing you to copy data out of them, rather than
12+
allowing you to hold references directly into the managed memory. This is very
13+
convenient for copying collectors as they don't have to implement pinning.
14+
15+
Elise is not like those APIs. With elise, you can have direct
16+
references into the managed heap. In fact, you can have arbitrary references
17+
between the stack, the managed heap, and the unmanaged heap:
18+
19+
- Garbage collected objects can own data allocated in the unmanaged heap, and
20+
that data will be dropped when those objects are collected.
21+
- Garbage collected objects can own references into the stack, and you are
22+
guaranteed not to be able to read from those references after they have gone
23+
out of scope in safe code.
24+
- You can store pointers to garbage collected objects in the heap or on the
25+
stack.
26+
27+
The transitive combination of all of these is true: for example, a GC'd object
28+
can own a heap-allocated vector of references to objects on the stack which
29+
themselves have GC'd objects inside them.
30+
31+
Note that, like all garbage collection in Rust (e.g. `Rc`), elise only
32+
provides immutable access to data it is managing. See the section on interior
33+
mutability later.
34+
35+
## What kind of garbage collector is `elise`?
36+
37+
Elise provides a garbage collector, but that is not what is interesting
38+
about elise. The garbage collector here is a mark-and-sweep of the
39+
simplest and least optimized possible variety. However, the API which makes it
40+
safe could apply to much more performant garbage collectors, specifically with
41+
these properties:
42+
43+
- This is an API for [tracing garbage collectors][tracing], not for other
44+
garbage collection techniques like reference counting.
45+
- This is an API for [precise][precise] tracing collectors, not a conservative
46+
collector like the Boehme GC.
47+
- The API could be trivially adapted to support concurrent GCs, though the
48+
current implementation is not thread safe.
49+
- The API *can* support moving collectors as long as they implement a pinning
50+
mechanism. A moving collector which does not support pinning is incompatible
51+
with elise's API goals.
52+
53+
## What is the state of the project?
54+
55+
Code has been written, sometimes frantically. A few basic smoke tests of things
56+
that should work working correctly has been done. No attempts at proofs have
57+
been made. It likely has glaring bugs. It might seg fault, ruin your laundry,
58+
halt and catch fire, etc.
59+
60+
**You should not use this for anything that you depend on (e.g. "in
61+
production")!** But if you want to play around with it for fun, by all means.
62+
63+
## What is `elise` going to be used for?
64+
65+
No idea! This is currently a research project.
66+
67+
## Why is it called `elise`?
68+
69+
"Für Elise" is a well known compositions from Beethoven. It's always played
70+
when there's a garbage truck in [Taiwan](https://www.youtube.com/watch?v=h7DPXpqp9e4).
71+
72+
## How does `elise` work?
73+
74+
In brief, a precise tracing garbage collector like elise is designed for
75+
works like this:
76+
77+
- All of the references from the unmanaged portion of memory (stack and heap,
78+
in our context) into the managed portion of memory are tracked. These are
79+
called *"roots."*
80+
- From those roots, the collector *"traces"* through the graph of objects to
81+
find all of the objects that can still be accessed from those roots (and
82+
therefore, the objects which are still "alive.")
83+
84+
Our API needs to properly account for both rooting objects and tracing through
85+
them to transitively rooted objects.
86+
87+
### Rooting
88+
89+
Given our limitations (i.e. no language support & the existence of a dynamic,
90+
unmanaged heap), it is necessary that we track our roots through some sort of
91+
intrusive collection. As a result, our roots cannot be moved around.
92+
93+
Fortunately, we have recently made a lot of progress on supporting intrusive
94+
data structures in Rust, thanks to the pinning API. The rooting layer sits on
95+
top of an underlying pinning API, which we use to guarantee that roots are
96+
dropped in a deterministic stack order.
97+
98+
Roots are created with a special macro called `letroot!`. The roots created
99+
with this macro carry a special lifetime called `'root`, which is the lifetime
100+
of the scope they are created in. You can use the `gc` method on a root to
101+
begin garbage collecting some data:
102+
103+
```rust
104+
// root: Root<'root>;
105+
letroot!(root);
106+
107+
let x: Gc<'root, i32> = root.gc(0);
108+
```
109+
110+
The `Gc` pointer is a copyable reference to the data which proves that the data
111+
has been rooted. It carries the lifetime of the root, and therefore can't
112+
outlive the root you used to create it.
113+
114+
In order to return Gc'd data from a function, you need to pass a root into the
115+
function:
116+
117+
```rust
118+
fn foo(root: Root<'root>) -> Gc<'root, i32> {
119+
root.gc(0);
120+
}
121+
```
122+
123+
You can also use a root to reroot data that has already been rooted once,
124+
extending its lifetime:
125+
126+
```rust
127+
fn foo(outer: Root<'root1>) -> Gc<'root1, i32> {
128+
// This root is only alive for the frame of this function call
129+
//
130+
// inner: Gc<'root2, i32>
131+
letroot!(inner);
132+
let x: Gc<'root2, i32> = inner.gc(0);
133+
134+
// But you can extend a Gc rooted only for this function using the outer root:
135+
let x: Gc<'root1, i32> = outer.reroot(x);
136+
return x;
137+
}
138+
```
139+
140+
### Tracing
141+
142+
Its not enough to be able to root objects in the Gc, you also need to be able
143+
to trace from the root to other objects *transitively*. For example, you might
144+
want a struct, stored in the Gc, with fields pointing to other objects which
145+
are also being garbage collected.
146+
147+
The problem that emerges is ensuring that you can only access transitively
148+
rooted objects when you know they are actually being traced from a rooted
149+
object. A few components enable us to solve this:
150+
151+
- First, to put a type in the garbage collector it must implement a trait which
152+
defines how to trace through it.
153+
- Second, instead of only having a `Gc` type, we have a second type: `GcStore`.
154+
- Using derived accessors, we can guarantee a safe API; let me explain:
155+
156+
The `Gc` type implements `Deref` and `Copy`, it functionally acts like a normal
157+
reference, except that you can extend its lifetime by rerooting it. It does not
158+
expose a safe API for constructing it: the only constructor is an unsafe
159+
`Gc::rooted` constructor: to safely call this constructor, you must prove that
160+
this will be rooted for the lifetime `'root`.
161+
162+
The `GcStore` type is more like a `Box`, except that it does not implement
163+
`Deref`. You can safely construct a `GcStore`, which will have `Box` semantics
164+
until it is rooted - that is, if you drop a `GcStore` without having rooted it
165+
first, it will deallocate what you have put into it.
166+
167+
Finally, as a part of the same derive which implements the traits necessary to
168+
garbage collect your type, you can implement an accessor to transform your
169+
`GcStore` fields into `Gc` fields. For example:
170+
171+
```rust
172+
#[derive(GC)]
173+
struct Foo<'root> {
174+
#[gc] bar: GcStore<'root, Bar>,
175+
}
176+
```
177+
178+
This code gives generates this method on Foo:
179+
180+
```rust
181+
fn bar(self: Gc<'root, Foo<'_>>) -> Gc<'root, Bar>
182+
```
183+
184+
Because the derive also guarantees that this field is traced properly, if you
185+
have a `Gc<Foo>`, it is safe to construct a `Gc<Bar>` from it.
186+
187+
This behavior is also implemented for several container types. For example, you
188+
can transform a `Vec<GcStore<_>>` to a `Vec<Gc>` in the same way:
189+
190+
```rust
191+
#[derive(GC)]
192+
struct Foo<'root> {
193+
#[gc] vec: Vec<GcStore<'root, Bar>>,
194+
}
195+
196+
// Generates:
197+
fn vec<'root>(self: Gc<'root, Self>) -> Vec<Gc<'root, Bar>>;
198+
```
199+
200+
### Destructors
201+
202+
Destructors present a troubling problem for garbage collectors. Destructors are
203+
safe because we can guarantee that they are run when the struct is dropped, but
204+
something garbage collected will not actually be dropped (and the destructor
205+
run) until much later. This can cause two problems:
206+
207+
* If the destructor accesses other Gc'd data, that data might have been freed
208+
earlier by the collector.
209+
* If the destructor accesses data on the stack, that data might have been freed
210+
when the stack was popped before the collector ran.
211+
212+
As a result, the GC does not run destructors on its objects. Instead, it runs a
213+
finalizer just before collecting each object. You can define what happens in
214+
the finalizer by implementing the `Finalize` trait for your type and adding a
215+
`#[gc(finalize)]` attribute to your struct:
216+
217+
```rust
218+
#[derive(GC)]
219+
#[gc(finalize)]
220+
struct Foo;
221+
222+
impl elise::Finalize for Foo {
223+
fn finalize(&mut self) {
224+
println!("Collecting a Foo");
225+
}
226+
}
227+
```
228+
229+
Because `Finalize` does not give you a `Gc` pointer to your type, you cannot
230+
access other `Gc` pointers (in other words, you cannot "prove rootedness"
231+
because you are no longer rooted in the finalizer.) However, this is
232+
insufficient for preventing you from accessing other non-owned data, like stack
233+
references.
234+
235+
As a result, if your type contains any lifetimes other than `'root`, attempting
236+
to implement a finalizer like this will fail. Instead, you will need to
237+
implement an unsafe finalizer:
238+
239+
```rust
240+
#[derive(GC)]
241+
#[gc(unsafe_finalize)]
242+
struct Foo<'a>(&'a i32);
243+
244+
unsafe impl elise::UnsafeFinalize for Foo {
245+
fn finalize(&mut self) {
246+
println!("Collecting a Foo");
247+
}
248+
}
249+
```
250+
251+
You must audit these finalizers and guarantee that your finalizer never reads
252+
from the any of the borrowed references inside of it, otherwise your code is
253+
not safe and contains undefined behavior.
254+
255+
### Interior mutability
256+
257+
The final problem is interior mutability: you can only get a shared reference
258+
to a GC'd pointer, ideally you would be able to mutate things inside of it
259+
using some form of interior mutability.
260+
261+
The unique problem has to do with tracing. Let's say you have a
262+
`RefCell<Option<GcStore<i32>>>` inside of your type:
263+
264+
```rust
265+
let x: Gc<RefCell<Option<GcStore<i32>>>>;
266+
267+
let moved: GcStore<i32> = x.borrow_mut().take().unwrap();
268+
269+
// The value behind `x` is now `None`. The `moved` variable is not being traced
270+
// at all, its entirely unrooted!
271+
272+
// Run the garbage collector. Because `moved` is unrooted, it will be
273+
// collected. `moved` is now a dangling pointer
274+
elise::collect();
275+
276+
// Put the moved and dangling pointer back into `x`:
277+
*x.borrow_mut() = Some(moved);
278+
279+
// Observe `x`, which is now dangling. Segfault!
280+
println!("{}", x);
281+
```
282+
283+
We cannot allow you to move traced `GcStore` pointers around without some other
284+
mechanism of rooting them.
285+
286+
For this reason, elise currently provides only partial support for
287+
interior mutability:
288+
289+
* There is a separate trait called `NullTrace`, which indicates that tracing
290+
through this type is a no-op (i.e. it contains no Gc'd pointers). You are
291+
free to have `Cell` and `RefCell` types containing `NullTrace` data.
292+
* `PinCell` is trace safe, because it does not allow you to move the data it
293+
gives you. If you can't move the data, you can't unroot it.
294+
295+
In other words, you are free to have normal interior mutability of anything
296+
that doesn't contain a Gc pointer, and you can have partial interior mutability
297+
(only pinned mutable references) for things that do contain Gc pointers.
298+
299+
Note that `PinCell` introduces some problems for copying collectors, because it
300+
gives you a `Pin<&mut T>`, which other code (e.g. async/await code) might rely
301+
on *memory* stability (as opposed to semantic stability, which we rely on).
302+
303+
Its an open problem to find new abstractable APIs which allow moving data only
304+
between traced memory locations, which would allow you to safely move Gc
305+
pointers around.
306+
307+
[tracing]: https://en.wikipedia.org/wiki/Tracing_garbage_collection
308+
[precise]: https://en.wikipedia.org/wiki/Tracing_garbage_collection#Precise_vs._conservative_and_internal_pointers
309+

src/lib/derive/Cargo.toml crates/derive/Cargo.toml

+4-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
[package]
2-
name = "shifgrethor-derive"
3-
edition = "2018"
2+
name = "elise-derive"
3+
edition = "2021"
44
version = "0.1.0"
5-
authors = ["Without Boats <[email protected]>"]
5+
description = "Macros of Für Elise"
6+
license = "Apache-2.0 OR MIT"
67

78
[dependencies]
89
quote = "0.6.8"

src/lib/derive/accessors.rs crates/derive/accessors.rs

+2-2
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ pub fn accessors(s: &Structure, gcs: &[&BindingInfo]) -> TokenStream {
2222
let ty: &Type = &b_ast.ty;
2323

2424
quote! {
25-
#visibility fn #method<'__root>(self: &'__root shifgrethor::Gc<'__root, Self>) -> <#ty as shifgrethor::raw::Store<'__root>>::Accessor {
25+
#visibility fn #method<'__root>(self: &'__root elise::Gc<'__root, Self>) -> <#ty as elise::raw::Store<'__root>>::Accessor {
2626
unsafe {
27-
shifgrethor::raw::Store::rooted(&self.#field)
27+
elise::raw::Store::rooted(&self.#field)
2828
}
2929
}
3030
}

src/lib/derive/lib.rs crates/derive/lib.rs

+2-2
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,9 @@ fn gc_derive(s: synstructure::Structure) -> TokenStream {
4242

4343
fn gc_impl(s: &synstructure::Structure) -> TokenStream {
4444
s.gen_impl(quote! {
45-
extern crate shifgrethor;
45+
extern crate elise;
4646

47-
gen impl<'__root> shifgrethor::GC<'__root> for @Self {
47+
gen impl<'__root> elise::GC<'__root> for @Self {
4848
}
4949
})
5050
}

0 commit comments

Comments
 (0)