Skip to content

andrew-s28/analysis-template

Repository files navigation

Scientific Analysis Code Template

This repository serves as a template for organizing and storing scientific analysis code in a structured and reproducible manner, with a particular focus on supporting Python notebook version control.

Purpose

This template was primarily designed because I got sick of dealing with a million different Python notebooks, none of which were being version controlled (who hasn't had analysis-updated-new_data_v3.ipynb?). By providing a template directory structure and commit workflow, future analyses will:

  • Maintain organized scientific analysis code
  • Promote reproducibility in research
  • Include version control for Python notebooks
  • Simplify dependency management
  • Encourage clean, consistent code

The primary advantage of this specific workflow over other scientific analysis packaging templates is the use of pre-commit hooks. The only thing you have to do in this workflow is commit to Git - the pre-commit hooks don't allow any un-formatted or un-converted commits to proceed. No additional command line instructions to learn - just use Git!

Getting Started

To use this template:

  1. Click the "Use this template" button in GitHub and select "Create a new repository".
  2. Clone the created repository onto your local machine.
  3. Add your analysis code and commit changes, following the instructions below.

Committing Python Notebooks

Finding a good solution to version controlling Python notebooks was the primary motivation for making this template. The workflow used here depends on using pre-commit with a jupytext hook, along with ruff for linting and formatting the produced Python files.

Follow these steps for version controlling your Python notebooks in this repository:

  1. Create Python notebooks in the notebooks/ directory (such as notebooks/example.ipynb).

  2. pre-commit install

    Install pre-commit hooks (if pre-commit is not installed, try uv tool install pre-commit or pip install pre-commit)

  3. git add notebooks/example.ipynb && git commit -m 'adds new notebook'

    This command should fail, but generates a Python file using jupytext.

  4. git add notebooks/example.ipynb notebooks/python/example.py && git commit -m 'adds new notebook'

    Add and commit new Python file and updated notebook, this time the command should succeed. Note you may want to double check the files after step 2 to ensure they were generated and formatted correctly.

  5. Make changes to notebooks/example.ipynb and repeat from step 2 (with a new commit message!).

Though it should work the same, it is recommended not to edit Python scripts in notebooks/python/ directly. If you need to use Python scripts (for example, for writing shared functions), place these in the scripts/ directory

Managing Dependencies with uv

In order to run your Python notebooks, you will need some third-party libraries. uv is an excellent tool for managing these dependencies. For more details on installation and usage, see the uv docs. Note that the pyproject.toml is setup to require python>=3.12, so make sure to change that if you have a dependency that requires an older Python version. It also comes pre-installed with ipykernel, which is required for running Python notebooks.

  1. uv sync

    Initialize a .venv with the dependencies from pyproject.toml installed (from here it should be synced automatically). Make sure to select this virtual environment when running Python notebooks!

  2. uv pip install -e .

    The scripts directory is setup to be included as an editable package in this project, to install simply run from the root directory. You can then import from any modules in the scripts subdirectory.

  3. uv add numpy

    From here, add additional dependencies as needed. uv will automatically update your virtual environment.

Directory Structure

analysis-template/
├── data/                  # Raw and processed data (consider using .gitignore for large files)
├── notebooks/             # Primary analysis Python notebooks
│   ├── functions/         # Python functions stored in .py files for use in notebooks
│   └── python/            # Python scripts generated by jupytext for version control
├── scripts/               # Independent Python scripts (not converted)
├── manuscript/            # Figures and files for manuscripts
├── presentation/          # Figures and files for presentations
├── misc/                  # Miscellaneous files
├── pyproject.toml         # Project specifications and dependency control (recommend using uv)
└── README.md              # Project overview and instructions

Directories are initialized with a .gitkeep file so they can be committed to version control "empty". You can remove these files when other files are added to the directory or ignore them entirely.

Contributions

If you think something could be improved, open an issue - I would love to discuss better ways to manage analysis code! Though do keep in mind that this repository was made primarily for my own use cases, so if we disagree, I might recommend cloning it and making changes in your local repository.

License

This template is licensed under the MIT License. Please give credit if you find this template useful :).

About

A template for structuring scientific analysis code, focused on Python notebooks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published