Skip to content

Edge Label Interning and Hybrid Edge Index #232

@RAprogramm

Description

@RAprogramm

Proposal: Edge Label Interning and Hybrid Edge Index

This issue proposes a set of functional changes to improve performance and memory efficiency in sodg.rs without altering the external API.


  • 1. Edge Label Interning

Introduce a new module labels.rs with a simple label interner.

Types

  • type LabelId = u32

  • LabelInterner with methods:

    • fn get_or_intern(&mut self, s: &str) -> LabelId
    • fn get(&self, s: &str) -> Option<LabelId>
    • fn resolve(&self, id: LabelId) -> Option<&str>

Details

  • Backed by two HashMaps: String -> LabelId and LabelId -> String.
  • Start IDs from 1. Reserve 0 for "not found".
  • No thread-safety required; interner belongs to Sodg.
  • All edges should store LabelId instead of String.

  • 2. Hybrid Edge Index per Vertex

Introduce a new module edge_index.rs.

use micromap::Map as MicroMap;

pub enum EdgeIndex {
    Small(MicroMap<LabelId, u32>),
    Large(std::collections::HashMap<LabelId, u32>),
}

Constant

  • SMALL_THRESHOLD: usize = 32

Methods

  • new() -> Self (starts as Small)

  • len(&self) -> usize

  • get(&self, label: LabelId) -> Option<u32>

  • insert(&mut self, label: LabelId, to: u32)

    • Use Small until length exceeds threshold, then migrate to Large.
  • remove(&mut self, label: LabelId) -> Option<u32>

Integration into Vertex

  • Add field index: EdgeIndex.
  • Keep Vec<Edge> for iteration/serialization.
  • Update Edge to store label: LabelId instead of String.
  • Ensure edges and index stay in sync.

  • 3. Integration into Sodg
  • Add labels: LabelInterner field to Sodg.
  • On edge creation (bind, etc.), convert &str to LabelId via get_or_intern.
  • On lookup (kid, etc.), convert &str to LabelId via get.
  • Update locator parsing to operate on &str slices directly (e.g., split('.')), avoiding temporary Vec<String>. Resolve LabelId per segment during traversal.

  • 4. Reducing Data Allocations
  • For data type like Hex, replace heavy clones with Arc<[u8]> inside an enum variant.

    • Clones become O(1).
    • Serialization must remain intact.
  • In gc:

    • Avoid cloning full Vertex.
    • Collect child_ids: Vec<u32> from an immutable borrow, then mutate graph afterward.

  • 5. Tests
  • Update tests: internal edges use LabelId, but external API still accepts &str.

  • Add new tests:

    • kid() lookup on vertices with degree 1, 31, and 33 (validate Small→Large transition).
    • Edge removal in both Small and Large variants.
    • find() with multi-segment paths.
    • Serialization/deserialization with LabelInterner (ensure indices rebuild correctly).

Expected Outcomes

  • Faster edge lookups on small-degree vertices via micromap.
  • Reduced memory overhead from interning labels.
  • O(1) clones for vertex data payloads.
  • No breaking changes to the public API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions