Skip to content
Draft

232 #237

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
85 commits
Select commit Hold shift + click to select a range
9dd4673
Add label interner infrastructure
RAprogramm Sep 26, 2025
aa0776d
implement-label-interner-and-update-lib.rs
RAprogramm Sep 26, 2025
0eaed11
Add adaptive edge index and edge struct
RAprogramm Sep 26, 2025
86c541b
#230/add-edgeindex-and-edge-struct-implementation
RAprogramm Sep 26, 2025
8c504d4
Refine label interning helpers
RAprogramm Sep 26, 2025
b62f5dd
#232: Merge pull request #17 from RAprogramm/eye-of-ra/audit/add-help…
RAprogramm Sep 26, 2025
3300605
Use edge label interner for formatting and merges
RAprogramm Sep 26, 2025
18c38fa
#232: Merge pull request #18 from RAprogramm/eye-of-ra/audit/refactor…
RAprogramm Sep 26, 2025
7332003
Rebuild vertex indexes after load
RAprogramm Sep 26, 2025
b836ef0
#232: Merge pull request #19 from RAprogramm/eye-of-ra/audit/update-s…
RAprogramm Sep 26, 2025
03946ff
Switch Hex to shared storage
RAprogramm Sep 26, 2025
33d9b75
#232: Merge pull request #20 from RAprogramm/eye-of-ra/audit/modify-h…
RAprogramm Sep 26, 2025
fc03111
Improve GC edge cleanup
RAprogramm Sep 26, 2025
196b856
#232: Merge pull request #21 from RAprogramm/eye-of-ra/audit/refactor…
RAprogramm Sep 26, 2025
8f99fee
Update tests and add KidRef iterator coverage
RAprogramm Sep 26, 2025
b4d56e3
#232: Merge pull request #22 from RAprogramm/eye-of-ra/audit/update-a…
RAprogramm Sep 26, 2025
abdac4c
Document bench utilities
RAprogramm Sep 26, 2025
08e0855
#232: Merge pull request #23 from RAprogramm/eye-of-ra/audit/add-new-…
RAprogramm Sep 26, 2025
443a003
Document label interning and edge indexing internals
RAprogramm Sep 26, 2025
615d202
#232: Merge pull request #24 from RAprogramm/eye-of-ra/audit/update-d…
RAprogramm Sep 26, 2025
98d76d0
#232: handle trailing script comments
RAprogramm Sep 26, 2025
34997ad
#232: Merge pull request #25 from RAprogramm/eye-of-ra/audit/run-proj…
RAprogramm Sep 26, 2025
f1fb3a5
Fix release-only warnings in data and kids
RAprogramm Sep 26, 2025
18e31ce
#232: Merge pull request #26 from RAprogramm/eye-of-ra/audit/fix-unus…
RAprogramm Sep 26, 2025
28e5921
#232: reset script variables per deployment
RAprogramm Sep 26, 2025
5c045ec
#232: Merge pull request #27 from RAprogramm/eye-of-ra/audit/run-carg…
RAprogramm Sep 26, 2025
f6ea30c
Switch branches to SmallVec
RAprogramm Sep 26, 2025
769acfc
#232: Merge pull request #28 from RAprogramm/eye-of-ra/audit/replace-…
RAprogramm Sep 26, 2025
2307942
docs: expand benchmark workflow guide
RAprogramm Sep 26, 2025
b047006
#232: Merge pull request #29 from RAprogramm/eye-of-ra/audit/add-perf…
RAprogramm Sep 26, 2025
11e18b9
Refine label and script parsing
RAprogramm Sep 27, 2025
f7a3b33
#232: Merge pull request #30 from RAprogramm/eye-of-ra/audit/add-inpu…
RAprogramm Sep 27, 2025
a3cd1cf
Optimize label canonicalization
RAprogramm Sep 27, 2025
f0ee710
#232: Merge pull request #31 from RAprogramm/eye-of-ra/audit/introduc…
RAprogramm Sep 27, 2025
c81f8bc
Use FxHasher for label interner maps
RAprogramm Sep 27, 2025
dfedad8
#232: Merge pull request #32 from RAprogramm/eye-of-ra/audit/update-l…
RAprogramm Sep 27, 2025
9d9ce9e
Refactor label interner to use structured keys
RAprogramm Sep 27, 2025
6cc315a
#232: Merge pull request #33 from RAprogramm/eye-of-ra/audit/refactor…
RAprogramm Sep 27, 2025
5aa0cc9
Use LabelKey for label interning
RAprogramm Sep 27, 2025
fd2851e
Merge branch '232' into eye-of-ra/introduce-labelkey-newtype-and-upda…
RAprogramm Sep 27, 2025
b8acfde
#232: Merge pull request #34 from RAprogramm/eye-of-ra/introduce-labe…
RAprogramm Sep 27, 2025
74e3186
Optimize alpha label formatting
RAprogramm Sep 27, 2025
d9752f5
Merge branch '232' into eye-of-ra/fix-benchmark-baseline-error
RAprogramm Sep 27, 2025
9836eb1
#232: Merge pull request #35 from RAprogramm/eye-of-ra/fix-benchmark-…
RAprogramm Sep 27, 2025
1838f73
Refine label interning implementation
RAprogramm Sep 27, 2025
8a334a9
#232: Merge pull request #36 from RAprogramm/eye-of-ra/fix-labels.rs
RAprogramm Sep 27, 2025
27a83f8
Refine label interner key storage
RAprogramm Sep 27, 2025
7e95b10
#232: Merge pull request #37 from RAprogramm/eye-of-ra/fix-compilatio…
RAprogramm Sep 27, 2025
366e225
Add preinterned label binding fast path
RAprogramm Sep 27, 2025
5cb039e
#232: Merge pull request #38 from RAprogramm/eye-of-ra/extend-labelin…
RAprogramm Sep 27, 2025
4d63cf2
Improve edge index promotion
RAprogramm Sep 27, 2025
b5285b0
#232: Merge pull request #39 from RAprogramm/eye-of-ra/adjust-promoti…
RAprogramm Sep 27, 2025
3342070
Validate canonical labels for pre-interned bindings
RAprogramm Sep 28, 2025
603bd5c
#232: Merge pull request #40 from RAprogramm/eye-of-ra/extend-labelin…
RAprogramm Sep 28, 2025
40484bc
Optimize merge label bindings
RAprogramm Sep 28, 2025
3dec10c
#232: Merge pull request #41 from RAprogramm/eye-of-ra/update-edge-cl…
RAprogramm Sep 28, 2025
2b6c1e9
Canonicalize labels before binding
RAprogramm Sep 28, 2025
aecccf3
#232: Merge pull request #42 from RAprogramm/eye-of-ra/add-label-cano…
RAprogramm Sep 28, 2025
d4fc8aa
#232: fix bench config
RAprogramm Sep 28, 2025
b6e80f1
Use FxHasher for large edge index map
RAprogramm Sep 28, 2025
ef6abb4
#232: Merge pull request #43 from RAprogramm/eye-of-ra/update-edgeind…
RAprogramm Sep 28, 2025
87141f6
Improve bind edge updates and add regression tests
RAprogramm Sep 28, 2025
e4319f3
#232: Merge pull request #44 from RAprogramm/eye-of-ra/update-vertex-…
RAprogramm Sep 28, 2025
ebb6872
Make Sodg::bind fallible
RAprogramm Sep 28, 2025
829404a
#232: Merge pull request #45 from RAprogramm/eye-of-ra/change-bind-to…
RAprogramm Sep 28, 2025
6704ad4
Track edge slots in edge index
RAprogramm Sep 28, 2025
9f2b53f
#232: Merge pull request #46 from RAprogramm/eye-of-ra/refactor-edgei…
RAprogramm Sep 28, 2025
c53c0c0
Optimize label key representation and update benches
RAprogramm Sep 28, 2025
9f599a4
#232: Merge pull request #47 from RAprogramm/eye-of-ra/redesign-label…
RAprogramm Sep 28, 2025
5613db8
Refine label key hashing
RAprogramm Sep 29, 2025
9db7a31
#232: Merge pull request #48 from RAprogramm/eye-of-ra/redesign-label…
RAprogramm Sep 29, 2025
3e110df
Merge branch 'master' into 232
RAprogramm Sep 29, 2025
6c946f3
#232 fix cargo.lock error with anyhow version
RAprogramm Sep 29, 2025
01265d3
#232 resolve conflicts
RAprogramm Sep 29, 2025
ee1fe86
#232 fix tests
RAprogramm Sep 29, 2025
c2e5e94
Optimize static data retrieval cleanup
RAprogramm Sep 29, 2025
218da71
#232: Merge pull request #49 from RAprogramm/eye-of-ra/add-helper-to-…
RAprogramm Sep 29, 2025
1f19590
#232 fmt stable
RAprogramm Sep 29, 2025
cb4640c
Merge branch '232' of github.com:RAprogramm/sodg.rs into 232
RAprogramm Sep 29, 2025
e744954
Implement lazy vertex initialization
RAprogramm Sep 29, 2025
e74ee1c
#232: Merge pull request #50 from RAprogramm/eye-of-ra/refactor-sodg-…
RAprogramm Sep 29, 2025
0235c62
Optimize branch cleanup edge scanning
RAprogramm Sep 29, 2025
15e0aef
#232: Merge pull request #51 from RAprogramm/eye-of-ra/refactor-clean…
RAprogramm Sep 29, 2025
ff0dfc1
Merge branch '232' of github.com:RAprogramm/sodg.rs into 232
RAprogramm Sep 29, 2025
663c12a
#232 fmt stable
RAprogramm Sep 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
287 changes: 239 additions & 48 deletions Cargo.lock

Large diffs are not rendered by default.

13 changes: 12 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,28 @@ categories = ["data-structures", "memory-management"]

[features]
gc = []
callgrind = []

[dependencies]
anyhow = "1.0.100"
bincode = { version = "2.0.1", features = ["serde"] }
ctor = "0.5.0"
emap = { version = "0.0.13", features = ["serde"] }
hex = "0.4.3"
itoa = "1.0.11"
itertools = "0.14.0"
lazy_static = "1.5.0"
libc = "0.2.176"
log = "0.4.28"
micromap = { version = "0.1.0", features = ["serde"] }
microstack = { version = "0.0.7", features = ["serde"] }
smallvec = { version = "1.15.1", features = ["serde"] }
arrayvec = "0.7.6"
nohash-hasher = "0.2.0"
openssl = { version = "0.10.73", features = ["vendored"] }
regex = "1.11.2"
rstest = "0.26.1"
rustc-hash = "2.1.1"
serde_bytes = "0.11.15"
serde = { version = "1.0.226", features = ["derive"] }
simple_logger = "5.0.0"
sxd-document = "0.3.2"
Expand All @@ -42,13 +46,20 @@ xml-builder = "0.5.4"

[dev-dependencies]
criterion = "0.7.0"
fsutils = "0.1.7"
iai-callgrind = "0.16.1"
predicates = "3.1.3"
tempfile = "3.23.0"

[[bench]]
name = "bench"
harness = false

[[bench]]
name = "edge_index"
harness = false
required-features = ["callgrind"]

[lints.clippy]
all = "warn"
pedantic = "warn"
Expand Down
112 changes: 103 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,33 +19,45 @@
right after the data it contains is read _and_ no other vertices
transitively point to it.

The current implementation keeps runtime overhead low by interning edge labels
and indexing them through a hybrid small-map/hash-map structure.
Edge payloads are stored in a `Hex` helper that keeps tiny blobs inline
and promotes larger values to reference-counted slices, so cloning a graph
snapshot stays cheap.


Check failure on line 28 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Multiple consecutive blank lines

README.md:28 MD012/no-multiple-blanks Multiple consecutive blank lines [Expected: 1; Actual: 2] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md012.md
Here is how you can create a di-graph:

```rust
use sodg::Sodg;
use sodg::Hex;
let mut g = Sodg::empty(256);
use std::str::FromStr as _;
use sodg::{Hex, Label, Sodg};
let mut g: Sodg<16> = Sodg::empty(256);
g.add(0); // add a vertex no.0
g.add(1); // add a vertex no.1
g.bind(0, 1, "foo"); // connect v0 to v1 with label "foo"
g.bind(0, 1, Label::from_str("foo").unwrap()).unwrap(); // connect v0 to v1 with label "foo"

Check failure on line 37 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Line length

README.md:37:81 MD013/line-length Line length [Expected: 80; Actual: 92] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md013.md
g.bind(0, 1, Label::from_str("bar").unwrap()).unwrap(); // add another edge with label "bar"

Check failure on line 38 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Line length

README.md:38:81 MD013/line-length Line length [Expected: 80; Actual: 92] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md013.md
g.put(1, &Hex::from_str_bytes("Hello, world!")); // attach data to v1
```

Then, you can find a vertex by the label of an edge departing
from another vertex:

```rust
let id = g.kid(0, "foo");
let id = g.kid(0, Label::from_str("foo").unwrap()).unwrap();
assert_eq!(1, id);
```

Then, you can find all kids of a vertex:

```rust
let kids: Vec<(String, String, usize)> = g.kids(0);
assert_eq!("foo", kids[0].0);
assert_eq!("bar", kids[0].1);
assert_eq!(1, kids[0].2);
let mut kids = g.kids(0);
let first = kids.next().unwrap();
assert_eq!("foo", first.label().to_string());
assert_eq!(1, first.destination());
let second = kids.next().unwrap();
assert_eq!("bar", second.label().to_string());
let label_id = first.label_id();
assert!(label_id > 0);
```

Then, you can read the data of a vertex:
Expand All @@ -62,6 +74,14 @@
println!("{:?}", g);
```

Multi-hop traversals remain ergonomic thanks to [`Sodg::find`], which walks a
dot-separated path and returns the final vertex if each edge exists:

```rust
assert_eq!(Some(1), g.find(0, "foo"));
assert_eq!(None, g.find(0, "foo.baz"));
```

Using `merge()`, you can merge two graphs together, provided they are trees.

Using `save()` and `load()`, you can serialize and deserialize the graph.
Expand All @@ -75,6 +95,80 @@

Read [the documentation](https://docs.rs/sodg/latest/sodg/).

## Benchmarks

The project ships two complementary benchmarking harnesses:

* A [Criterion](https://github.com/bheisler/criterion.rs) suite in
`benches/bench.rs` that measures wall-clock performance of vertex management,
edge insertion/removal/lookups, and multi-segment `find()` traversals across
different out-degrees.
* An [`iai-callgrind`](https://github.com/gungraun/gungraun) harness in
`benches/edge_index.rs` (guarded by the `callgrind` feature) that collects
Valgrind statistics for the same scenarios.

Criterion only requires a stable Rust toolchain. Gnuplot is optional; when it is
not installed, Criterion falls back to the bundled Plotters backend for report
generation.

### Running Criterion locally

1. Build and run all Criterion benchmarks:

```bash
cargo bench --bench bench
```

Check failure on line 120 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines

README.md:120 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md031.md
2. Inspect the generated reports under
`target/criterion/<benchmark>/<measurement>/report/index.html`. They include
plots, summary statistics, and raw sample data.

### Comparing against a saved baseline

Criterion can persist previous results to highlight regressions and
improvements:

1. Check out the branch or commit that should become the reference point (for
example `master`) and save a baseline:

```bash
cargo bench --bench bench -- --save-baseline master
```

Check failure on line 135 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines

README.md:135 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md031.md
The snapshot is stored inside `target/criterion` and can be reused across
branches as long as the directory is kept intact.
2. Switch back to your working branch and compare the current code against the
saved numbers:

```bash
cargo bench --bench bench -- --baseline master
```

Check failure on line 143 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines

README.md:143 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md031.md
3. Examine the console output or open the HTML reports to see per-benchmark
percentage changes. Positive percentages indicate improvements, negative ones
signal regressions.

To focus on a single benchmark group, pass its name after a double dash, e.g.
`cargo bench --bench bench -- find_multi_segment`.

### Running the Callgrind harness

The Callgrind harness provides instruction- and cache-level metrics:

1. Install Valgrind (`apt install valgrind` on Debian/Ubuntu). Gungraun supports
Linux and other platforms with working Valgrind ports; Windows is not
supported.
2. Enable the `callgrind` feature and execute the harness:

```bash
cargo bench --features callgrind --bench edge_index
```

Check failure on line 162 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Fenced code blocks should be surrounded by blank lines

README.md:162 MD031/blanks-around-fences Fenced code blocks should be surrounded by blank lines [Context: "```"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md031.md
3. Review the generated profiles under `target/iai` with tools like
`kcachegrind`, `callgrind_annotate`, or Gungraun’s HTML report. This makes it
easy to inspect hot paths and validate that optimizations change instruction
counts in the intended way.

When the feature is not enabled, the `edge_index` binary prints a short message
and exits immediately so the harness can stay in the repository without adding a
hard dependency on Valgrind.

## How to Contribute

First, install [Rust](https://www.rust-lang.org/tools/install) and then:
Expand Down
Loading
Loading