Skip to content

Automating the acquisition, metadata extraction and consolidation of publicly available 3D electron microscopy datasets

License

Notifications You must be signed in to change notification settings

helensilva14/3d-electron-data-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

3D Electron Microscopy - Data Acquisition and Preparation

This project automates the acquisition, metadata extraction, and consolidation of publicly available 3D electron microscopy datasets, with the future goal of enabling efficient, block-wise access for AI/ML pipelines.

Project Goals

  1. Automated Data Download: Robust scripts to download diverse 3D electron microscopy datasets from multiple sources.
  2. Metadata Extraction & Consolidation: Identify, extract, and harmonize relevant metadata (e.g., attrs, chunks) from each dataset, providing a unified and queryable view (see METADATA_SUMMARY.md).
  3. AI/ML Pipeline Data Access Design: Outline and prototype a strategy for block-wise access to large 3D image datasets for scalable AI/ML workflows (see DATA_ACCESS_DESIGN.md).

Datasets

The following publicly available 3D electron microscopy datasets are targeted:

  1. EMPIAR-11759: https://www.ebi.ac.uk/empiar/EMPIAR-11759/
  2. EPFL-Hippocampus: https://www.epfl.ch/labs/cvlab/data/data-em/
  3. Hemibrain-NG: https://tinyurl.com/hemibrain-ng (Note: Only a random 1000x1000x1000 pixel crop region will be downloaded for this dataset.)
  4. JRC-MUS-NACC: https://openorganelle.janelia.org/datasets/jrc_mus-nacc-2
  5. U2OS-Chromatin: https://idr.openmicroscopy.org/webclient/img_detail/9846137/?dataset=10740

Tools & Dependencies

  • Python 3.12
  • DVC (optional, for data versioning)
  • tifffile for handling TIFF files
  • cloud-volume for Neuroglancer data
  • zarr for scalable array storage
  • pyDM3reader for DM3 files
  • requests, ftplib for downloads
  • pandas for summary tables
  • See requirements.txt for the full list

Installation & Usage

git clone https://github.com/helensilva14/3d-electron-data-project.git
cd 3d-electron-data-project

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Run the main pipeline (downloads data, extracts metadata, consolidates):

python3 src/main.py

Outputs will be saved in the outputs/ and docs/ directories.

License

This project is licensed under the Apache-2.0 License.

About

Automating the acquisition, metadata extraction and consolidation of publicly available 3D electron microscopy datasets

Resources

License

Stars

Watchers

Forks

Languages