A repository for synthetic subgraph-detection benchmarking and PNA baselines on directed multigraphs.
This repository includes a synthetic subgraph-detection dataset used for benchmarking graph models for the pattern detection task. The graphs and labels are generated following the pseudocode/configurations described in Provably Powerful Graph Neural Networks for Directed Multigraphs (Egressy et al.).
-
Three splits: train, val, test
-
Saved as PyTorch tensors under
./data/:train.pt,val.pt,test.ptobjects with node-level labelsy_sums.csv— per-split counts of positive labels per sub-task
-
Per-split label percentages and mean across splits are stored under
./results/metrics/, useful to sanity-check against the paper’s reported marginals
Each node is labeled for the presence of the following patterns (11 sub-tasks):
deg_in > 3deg_out > 3fan_in > 3fan_out > 3cycle2cycle3cycle4cycle5cycle6scatter_gatherbiclique
Graph instances are reproducible. A single BASE_SEED deterministically derives distinct seeds for each split (train/val/test), ensuring:
- different graphs within a run for the splits,
- identical graphs across runs with the same
BASE_SEED.
The default config (see the generator script scripts/generate_synthetic.py) follows the paper’s setup:
- Nodes
n = 8192 - Average degree
d = 6 - Radius parameter
r = 11.1 - Directed multigraphs (for directed cycles)
- Generator: “chordal” / random-circulant-like
- One connected component per split (prevents data leakage)
From the repo root:
# Generate graphs and labels
python3 -m scripts.generate_syntheticAfter step (1), you’ll find train.pt, val.pt, test.pt, and y_sums.csv under ./data/. The label_percentages.csv will be saved under ./results/metrics/.
PNA model is implemented by following the model architecture described in Principal Neighbourhood Aggregation for Graph Nets (Corso et al.).
From the repo root:
# Train and test PNA model on the generated graph data
python3 -m scripts.train_pna_baselineTraining the PNA model with Reverse Message Passing, which uses a Heterogeneous Graph and Ego IDs to detect the fraud patterns listed under Label Tasks.
From the repo root:
# Train and test enhanced PNA model on the generated graph data
python3 -m scripts.train_pna_reverse_mp_with_egoFrom the repo root:
python3 -m scripts.pna_hyperparameter_tuning