Skip to content

catalystneuro/dandi-access-vis

Repository files navigation

DANDI Access Visualization Tools

This repository contains tools for creating geographic visualizations of DANDI data access patterns, including choropleth maps and scatter plots showing data download patterns by country and region.

The source data that is being visualized is here: https://github.com/dandi/access-summaries. By default, the access-summaries repo is expected to be downloaded next to this repo, though this can be adjusted using optional CLI args.

Features

  • Choropleth Maps: Country-level data visualization with color-coded regions
  • Subdivision Choropleth Maps: Admin-1 (state/province) level visualization using GADM data
  • Scatter Maps: Region-level visualization with proportional point sizes
  • Multiple Dandiset Support: Process specific dandisets or combinations of dandisets
  • Flexible Data Paths: Configure custom data directory locations
  • Centralized Styling: Consistent color schemes across all visualizations
  • Publication Quality: High-resolution SVG and PDF outputs
  • Flexible Scaling: Linear and logarithmic scale options

Installation

Navigate to the visualization directory and install dependencies:

cd visualization
pip install -r requirements.txt

Dependencies

  • pandas: Data manipulation
  • matplotlib: Plotting framework
  • numpy: Numerical operations
  • cartopy: Geographic projections and mapping
  • pyyaml: YAML configuration file parsing

Usage

Basic Usage

All Dandisets (Default)

# Process all available dandisets
python create_choropleth.py --log-scale

Creates: Global country-level visualization showing 7.69 PB across 117 countries

Global Choropleth Map

Global DANDI downloads by country (logarithmic scale) - showing Netherlands and US as top consumers

Single Dandiset

# Process specific dandiset with automatic filename
python create_choropleth.py --dandiset 000026 --log-scale

Creates: Focused view of single dandiset (114.23 TB across 44 countries)

Single Dandiset Choropleth

Dandiset 000026 downloads by country - US and Netherlands dominate usage

Multiple Dandisets

# Process multiple specific dandisets
python create_scatter_map.py --dandiset 000026,000409,000488

Creates: Regional scatter plot showing precise geographic distribution

Multi-Dandiset Scatter Map

Combined regional view of 3 dandisets - points show both location and download volume with color/size coding

All Dandisets (Regional View)

# Process all available dandisets as scatter plot
python create_scatter_map.py

Creates: Comprehensive regional scatter plot showing global access patterns (655 regions across 470 dandisets)

Global Scatter Map

Global DANDI regional access patterns - comprehensive view of all dandisets showing worldwide download distribution

Subdivision Choropleth (State/Province-level)

# Create subdivision-level choropleth with max cap at 10 TB
python create_subdivision_choropleth.py --log-scale --max 10TB

Creates: High-resolution state/province-level choropleth using GADM administrative boundaries

Subdivision Choropleth Map

Subdivision-level visualization showing downloads at admin-1 granularity (states, provinces, regions) - uses GADM 4.1 data for accurate boundary matching

Temporal Analysis

# Show downloads over time with top dandisets
python create_temporal_chart.py

Creates: Cumulative stacked area chart showing growth of DANDI downloads (4.49 PB across 469 dandisets, 2021-2025)

Temporal Chart

DANDI cumulative downloads over time - stacked visualization showing growth of top 10 dandisets individually with others grouped as "Other"

Command Reference

Choropleth Maps (Country-level)

python create_choropleth.py [options]

Options:
  --log-scale, -l          Use logarithmic scale (recommended for wide ranges)
  --output, -o FILE        Output filename (default: output/choropleth_map.svg)
  --data-path, -d PATH     Data directory (default: ../access-summaries/content)
  --dandiset DANDISETS     Comma-separated dandiset IDs (default: all)
  --help                   Show help message

Scatter Maps (Region-level)

python create_scatter_map.py [options]

Options:
  --output, -o FILE        Output filename (default: output/scatter_map.svg)
  --data-path, -d PATH     Data directory (default: ../access-summaries/content)
  --dandiset DANDISETS     Comma-separated dandiset IDs (default: all)
  --help                   Show help message

Subdivision Choropleth Maps (State/Province-level)

python create_subdivision_choropleth.py [options]

Options:
  --log-scale, -l          Use logarithmic scale (recommended for wide ranges)
  --output, -o FILE        Output filename (default: output/subdivision_choropleth.svg)
  --data-path, -d PATH     Data directory (default: ../access-summaries/content)
  --dandiset DANDISETS     Comma-separated dandiset IDs (default: all)
  --max, -m VALUE          Maximum value for color scale (e.g., '10TB', '500GB')
  --dry-run                Test matching logic without generating plot
  --help                   Show help message

Temporal Charts (Time-series)

python create_temporal_chart.py [options]

Options:
  --output, -o FILE        Output filename (default: output/temporal_chart.svg)
  --data-path, -d PATH     Data directory (default: ../access-summaries/content)
  --dandiset DANDISETS     Comma-separated dandiset IDs (default: all)
  --top-n, -n NUMBER       Number of top dandisets to show individually (default: 10)
  --help                   Show help message

Output Files

Both scripts generate:

  • SVG files: Vector format for publications (300 DPI equivalent)
  • PDF files: Alternative format for presentations
  • Console output: Summary statistics and processing information

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages