LabLink Paper Figures

This repository contains figures and analysis code for the LabLink paper.

About LabLink

LabLink is a dynamic VM allocation and management system for computational research workflows. It enables researchers to dynamically provision and manage virtual machines for computational tasks in the cloud.

Related Repositories

lablink - Core LabLink system with allocator and client services
lablink-template - Infrastructure-as-code template for deploying LabLink
sleap-lablink - SLEAP-specific deployment example

Repository Structure

lablink-paper-figures/
├── data/               # Raw and processed data
│   ├── raw/           # Original data files
│   └── processed/     # Processed data ready for plotting
├── figures/           # Generated figures for the paper
│   ├── main/         # Main text figures
│   └── supplementary/ # Supplementary figures
├── notebooks/         # Jupyter notebooks for analysis
├── scripts/          # Python scripts for data processing and plotting
│   ├── analysis/     # Data analysis scripts
│   └── plotting/     # Figure generation scripts
├── src/              # Source code for reusable modules
└── tests/            # Unit tests

Setup

Prerequisites

Python 3.10+
uv - Fast Python package installer and resolver

Installation

# Clone the repository
git clone https://github.com/talmolab/lablink-paper-figures.git
cd lablink-paper-figures

# Install dependencies (creates .venv automatically)
uv sync

# Activate the virtual environment
# On Unix/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

Development Setup

For development work (includes Jupyter, testing, and linting tools):

uv sync --all-extras

Running Jupyter

uv run jupyter lab

Usage

Generating Architecture Diagrams

This repository includes comprehensive architecture diagram generation for LabLink. The diagrams are automatically generated from Terraform infrastructure-as-code configurations to ensure accuracy.

Prerequisites

To generate diagrams, you need access to:

Infrastructure Terraform: ../lablink-template/lablink-infrastructure
(Optional) Client VM Terraform: ../lablink/packages/allocator/src/lablink_allocator_service/terraform

Note: The Terraform icon for the VM provisioning diagram is included in assets/icons/terraform.svg. If this file is missing, the diagram will use a generic placeholder icon.

Font Size Presets

Diagrams support three font size presets for different output contexts:

paper (default): 14pt fonts for publications and papers
- Node labels: 14pt
- Edge labels: 14pt
- Title: 32pt
- Spacing: nodesep=0.6, ranksep=0.8
poster: 20pt fonts for poster presentations
- Node labels: 20pt (43% larger)
- Edge labels: 20pt
- Title: 48pt
- Spacing: nodesep=0.9, ranksep=1.2 (50% more space to prevent overlap)
presentation: 16pt fonts for slides
- Node labels: 16pt
- Edge labels: 16pt
- Title: 40pt
- Spacing: nodesep=0.75, ranksep=1.0

Use the --fontsize-preset flag to select a preset:

# Generate diagrams for poster presentation
uv run python scripts/plotting/generate_architecture_diagram.py \
  --terraform-dir ../lablink-template/lablink-infrastructure \
  --diagram-type all-essential \
  --fontsize-preset poster

Diagram Types

Priority 1 (Essential - for main paper):

main - System overview with Admin → Allocator → VMs → Logs
vm-provisioning - VM provisioning & lifecycle with 3-phase startup
crd-connection - CRD connection via PostgreSQL LISTEN/NOTIFY (unique architectural innovation)
logging-pipeline - CloudWatch → Lambda → Allocator logging flow

Priority 2 (Supplementary):

cicd-workflow - GitHub Actions CI/CD pipeline (7 workflows)
api-architecture - Flask API endpoints (22 endpoints with auth)
network-flow-enhanced - Network topology with ports & protocols
monitoring - VM status & health monitoring services
data-collection - SSH → Docker → rsync data export flow

Legacy diagrams:

detailed - Complete infrastructure with all resources
network-flow - Basic network routing diagram

Quick Start

Generate all essential diagrams:

uv run python scripts/plotting/generate_architecture_diagram.py \
  --terraform-dir ../lablink-template/lablink-infrastructure \
  --output-dir figures/main \
  --diagram-type all-essential

Generate all supplementary diagrams:

uv run python scripts/plotting/generate_architecture_diagram.py \
  --terraform-dir ../lablink-template/lablink-infrastructure \
  --output-dir figures/supplementary \
  --diagram-type all-supplementary

Generate a specific diagram:

uv run python scripts/plotting/generate_architecture_diagram.py \
  --terraform-dir ../lablink-template/lablink-infrastructure \
  --output-dir figures/main \
  --diagram-type crd-connection \
  --format png \
  --dpi 300

Environment Variables

You can set environment variables to avoid repeated CLI arguments:

export LABLINK_TERRAFORM_DIR=../lablink-template/lablink-infrastructure
export LABLINK_CLIENT_VM_TERRAFORM_DIR=../lablink/packages/allocator/src/lablink_allocator_service/terraform

uv run python scripts/plotting/generate_architecture_diagram.py --diagram-type all-essential

Output Formats

Supported formats: png (default), svg, pdf

PNG: Best for papers (300 DPI default)
SVG: Vector format for presentations
PDF: Vector format for LaTeX papers

Output Structure

Diagrams are generated in timestamped run folders for version control:

figures/
└── run_20251113_170838/
    ├── main/
    │   ├── lablink-architecture.png
    │   ├── lablink-vm-provisioning.png
    │   ├── lablink-crd-connection.png
    │   ├── lablink-logging-pipeline.png
    │   └── diagram_metadata.txt
    └── supplementary/
        ├── lablink-cicd-workflow.png
        ├── lablink-api-architecture.png
        └── ...

Each run folder contains both main/ and supplementary/ subdirectories based on which diagrams were generated. This structure makes it easy to:

Track which diagrams were generated together
Compare different versions
Roll back to previous diagram sets

Important: Output Directory Workflow

Always generate to figures/main/ or figures/supplementary/:

# Correct - generates to figures/main/ and overwrites existing files
uv run python scripts/plotting/generate_architecture_diagram.py \
  --terraform-dir ../lablink-template/lablink-infrastructure \
  --output-dir figures/main \
  --diagram-type all-essential

DO NOT use custom output directories like test_output/ or review_figures/ - these create clutter and are not tracked by git. The script will create timestamped run folders automatically (e.g., figures/run_20251115_130350/) which are gitignored for temporary review.

DO NOT use --no-timestamp-runs - this flag is only for special cases and can cause confusion about which files are current.

Tracked files:

figures/main/ - Final production versions (committed to git)
figures/supplementary/ - Supplementary diagrams (committed to git)

Gitignored:

figures/run_*/ - Timestamped run folders (temporary)
review_figures/, test_output/ - Any custom output directories

Generating Scientific Software Complexity Figure

This repository includes tools to analyze and visualize dependency growth in scientific Python packages over time (2000-2025), demonstrating the increasing complexity of scientific software and motivating LabLink's value proposition for reproducible computational environments.

Overview

The software complexity analysis tracks 8-10 representative scientific Python packages:

Core Scientific Stack: NumPy, SciPy, matplotlib, pandas (2001-2010 era)
Machine Learning: TensorFlow, PyTorch, scikit-learn (2015+ deep learning era)
Domain/Workflow: AstroPy, Jupyter (domain-specific and notebook workflows)

Data sources:

conda-forge for compiled packages (captures system dependencies like BLAS, compilers)
PyPI for pure-Python packages (simpler distributions)
GitHub for validation

Quick Start

Step 1: Collect dependency data (requires internet, ~20-40 minutes):

cd scripts/analysis

# Collect data for all packages
uv run python collect_dependency_data.py

# Optional: use GitHub token for higher rate limits
uv run python collect_dependency_data.py --github-token YOUR_TOKEN

Step 2: Process data into clean CSV:

uv run python process_dependency_data.py

Step 3: Generate figures:

cd ../plotting

# Generate paper format (default: 14pt fonts, 300 DPI)
uv run python plot_software_complexity.py --format paper

# Generate poster format (20pt fonts for readability at distance)
uv run python plot_software_complexity.py --format poster

# Generate presentation format (16pt fonts for slides)
uv run python plot_software_complexity.py --format presentation

Output

The script generates:

Main figure: figures/main/software_complexity_over_time.{png,pdf} - All packages over time with trend lines
Category comparison: figures/main/software_complexity_by_category.{png,pdf} - Faceted by package category
Metadata: figures/main/software_complexity_metadata.txt - Generation details and data sources

Configuration

Format presets:

paper: 14pt fonts, 300 DPI, 6.5" width (two-column journal)
poster: 20pt fonts, 300 DPI, 12" width (conference poster)
presentation: 16pt fonts, 150 DPI, 10"x7.5" (slide deck)

Advanced options:

# Generate specific figures
uv run python plot_software_complexity.py --figures main

# Use custom data file
uv run python plot_software_complexity.py --data-file custom_data.csv

# Custom output directory
uv run python plot_software_complexity.py --output-dir figures/supplementary

How It Works

Data collection: Fetches historical dependency metadata from conda-forge feedstocks and PyPI API
Processing: Aggregates data, resolves conflicts (GitHub > conda-forge > PyPI), filters low-quality data
Visualization: Creates connected scatter plots with LOESS trend lines showing dependency growth

Why conda-forge for scientific packages? conda-forge metadata includes complete system-level dependencies (BLAS, LAPACK, compilers) that PyPI omits. For example:

PyPI SciPy 1.7.0: requires_dist: ["numpy>=1.16.5"] (1 dependency)
conda-forge SciPy 1.7.0: numpy, python, blas, libopenblas, libgfortran (5+ dependencies)

This provides accurate complexity metrics that reflect what users actually experience.

See data/software_complexity_README.md for detailed documentation.

Generating SLEAP Dependency Network Graph

This repository includes tools to generate dependency network visualizations for the SLEAP software package, demonstrating the complexity of modern computational research software to motivate LabLink's value proposition.

Overview

The dependency graph visualization shows:

Nodes: Python packages (SLEAP and all transitive dependencies)
Edges: Dependency relationships (directed arrows)
Visual encoding:
- Node size scaled by degree centrality (hub packages are larger)
- Node color by package category (ML/scientific/data/visualization/utilities)
- Labels shown for high-degree packages only
Statistics: Total packages, dependencies, and top most-connected packages

Quick Start

Generate dependency graph using local SLEAP installation:

uv run python scripts/plotting/generate_sleap_dependency_graph.py \
  --sleap-source C:/repos/sleap \
  --preset paper \
  --output-dir figures/main

Generate using GitHub URL (no local installation required):

uv run python scripts/plotting/generate_sleap_dependency_graph.py \
  --sleap-source https://github.com/talmolab/sleap/blob/develop/pyproject.toml \
  --preset poster \
  --format svg

Use environment variable for convenience:

export SLEAP_PATH=C:/repos/sleap  # On Windows: set SLEAP_PATH=C:/repos/sleap
uv run python scripts/plotting/generate_sleap_dependency_graph.py

Configuration Options

Presets (matching repository conventions):

paper (default): 14pt fonts, 12x10" figure, 300 DPI - for publications
poster: 20pt fonts, 18x15" figure, 300 DPI - for poster presentations

Output formats:

png (default): Raster image at 300 DPI
svg: Vector graphics for presentations
pdf: Vector graphics for LaTeX papers

Advanced options:

# Limit dependency depth (faster, smaller graph)
uv run python scripts/plotting/generate_sleap_dependency_graph.py \
  --max-depth 2 \
  --label-threshold 3

# Exclude optional dependencies
uv run python scripts/plotting/generate_sleap_dependency_graph.py \
  --exclude-optional

# Force refresh cached data
uv run python scripts/plotting/generate_sleap_dependency_graph.py \
  --force-refresh

# Generate supplementary degree distribution plot
uv run python scripts/plotting/generate_sleap_dependency_graph.py \
  --show-distribution

# Use different layout algorithm
uv run python scripts/plotting/generate_sleap_dependency_graph.py \
  --layout kamada_kawai

Output

The script generates:

sleap-dependency-graph.png (or .svg, .pdf) - Main dependency network visualization
data/processed/sleap_dependencies.json - Cached dependency data (speeds up subsequent runs)

Optional with --show-distribution:

sleap-dependency-distribution.png - Degree distribution plots showing power-law characteristics

How It Works

Dependency extraction: Parses SLEAP's pyproject.toml to get direct dependencies
Transitive resolution: Recursively fetches dependency metadata from PyPI JSON API
Graph construction: Builds NetworkX directed graph with package categorization
Layout computation: Uses force-directed spring layout to emphasize hub packages
Visualization: Renders publication-quality figure with matplotlib/seaborn

Note: Initial run takes 1-2 minutes to fetch PyPI metadata for ~100-200 packages. Subsequent runs use cached data and complete in ~5 seconds.

Generating GPU Cost Trends Visualization

This repository includes tools to visualize GPU pricing trends for scientific computing, demonstrating the sustained high cost of GPU hardware that motivates LabLink's cloud-based GPU access model.

Overview

The GPU cost trends visualization shows:

Price trends over time: Launch MSRP of professional and consumer GPUs from 2006-2025
Professional GPUs: Tesla, A100, H100, V100, P100 families (datacenter-class hardware)
Consumer GPUs: RTX/GTX series with ≥5 TFLOPS (SLEAP-compatible cards)
Optional price-performance subplot: FLOP/s per dollar showing efficiency improvements (~2.5 year doubling time)

The visualization uses the Epoch AI Machine Learning Hardware Database, a peer-reviewed dataset with 160+ GPUs covering ML-relevant hardware from 2006-2025.

Data Setup

Before generating figures, download the Epoch AI dataset:

Visit https://epoch.ai/data/machine-learning-hardware
Click "Download Data" to get the CSV file
Extract the ZIP and place ml_hardware.csv in data/raw/gpu_prices/

See data/raw/gpu_prices/README.md for detailed instructions.

Quick Start

Generate paper format figure (default):

uv run python scripts/plotting/plot_gpu_cost_trends.py

Generate poster format:

uv run python scripts/plotting/plot_gpu_cost_trends.py --preset poster

Generate with price-performance subplot:

uv run python scripts/plotting/plot_gpu_cost_trends.py --include-performance

Configuration Options

Presets (matching repository conventions):

paper (default): 14pt fonts, 6.5"x5" figure, 300 DPI - for two-column journals
poster: 20pt fonts, 12"x9" figure, 300 DPI - for poster presentations
presentation: 16pt fonts, 10"x7.5" figure, 150 DPI - for slide decks

Output formats:

both (default): Generates both PNG and PDF versions
png: Raster image at 300 DPI
pdf: Vector graphics for LaTeX papers

Advanced options:

# Custom output path
uv run python scripts/plotting/plot_gpu_cost_trends.py \
  --output figures/supplementary/gpu_analysis.png

# Custom data path (if dataset is in different location)
uv run python scripts/plotting/plot_gpu_cost_trends.py \
  --data /path/to/ml_hardware.csv

# Verbose logging for debugging
uv run python scripts/plotting/plot_gpu_cost_trends.py --verbose

Output

The script generates:

gpu_cost_trends.png (and/or .pdf) - Main price trends visualization
gpu_cost_trends-metadata.txt - Data source, statistics, and generation details

With --include-performance:

Creates 1x2 subplot grid with both price trends and price-performance evolution

Data Source

Epoch AI Machine Learning Hardware Database

URL: https://epoch.ai/data/machine-learning-hardware
License: CC BY 4.0 (free for academic and commercial use)
Coverage: 160+ GPUs from 2006-2025 with pricing and performance metrics
Citation: Epoch AI (2024), 'Data on Machine Learning Hardware'

GPU Selection Criteria

The visualization filters to ML-relevant GPUs:

Professional: Tesla, A100, H100, V100, P100, A6000 families
Consumer: RTX/GTX series with FP32 performance ≥ 5.0 TFLOPS
Excluded: Mobile GPUs, laptop variants, low-performance gaming cards

This filtering ensures the visualization shows hardware actually used for scientific computing and pose estimation workloads like SLEAP.

Example Use Cases

For LabLink paper motivation:

# Generate both formats for paper figure
uv run python scripts/plotting/plot_gpu_cost_trends.py \
  --preset paper \
  --output figures/main/gpu_cost_trends.png \
  --format both

For poster/presentation:

# Larger fonts and price-performance comparison
uv run python scripts/plotting/plot_gpu_cost_trends.py \
  --preset poster \
  --include-performance \
  --format pdf

Note: First run validates data quality (minimum 50 GPUs, date range 2006+, >70% with pricing). If validation fails, check that you've downloaded the correct Epoch AI dataset.

Development

Running Tests

uv run pytest

Code Formatting

uv run ruff check .
uv run ruff format .

Citation

If you use this work, please cite:

TODO: Add citation

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.claude		.claude
.serena		.serena
analysis		analysis
assets		assets
data		data
figures		figures
notebooks		notebooks
openspec		openspec
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
BUG_ANALYSIS_AND_FIX_PLAN.md		BUG_ANALYSIS_AND_FIX_PLAN.md
BUG_VISUALIZATION.txt		BUG_VISUALIZATION.txt
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

talmolab/lablink-paper-figures

Folders and files

Latest commit

History

Repository files navigation

LabLink Paper Figures

About LabLink

Related Repositories

Repository Structure

Setup

Prerequisites

Installation

Development Setup

Running Jupyter

Usage

Generating Architecture Diagrams

Prerequisites

Font Size Presets

Diagram Types

Quick Start

Environment Variables

Output Formats

Output Structure

Important: Output Directory Workflow

Generating Scientific Software Complexity Figure

Overview

Quick Start

Output

Configuration

How It Works

Generating SLEAP Dependency Network Graph

Overview

Quick Start

Configuration Options

Output

How It Works

Generating GPU Cost Trends Visualization

Overview

Data Setup

Quick Start

Configuration Options

Output

Data Source

GPU Selection Criteria

Example Use Cases

Development

Running Tests

Code Formatting

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages