This repository serves as a template for organizing and storing scientific analysis code in a structured and reproducible manner, with a particular focus on supporting Python notebook version control.
This template was primarily designed because I got sick of dealing with a million different Python notebooks, none of which were being version controlled (who hasn't had analysis-updated-new_data_v3.ipynb?). By providing a template directory structure and commit workflow, future analyses will:
- Maintain organized scientific analysis code
- Promote reproducibility in research
- Include version control for Python notebooks
- Simplify dependency management
- Encourage clean, consistent code
The primary advantage of this specific workflow over other scientific analysis packaging templates is the use of pre-commit hooks. The only thing you have to do in this workflow is commit to Git - the pre-commit hooks don't allow any un-formatted or un-converted commits to proceed. No additional command line instructions to learn - just use Git!
To use this template:
- Click the "Use this template" button in GitHub and select "Create a new repository".
- Clone the created repository onto your local machine.
- Add your analysis code and commit changes, following the instructions below.
Finding a good solution to version controlling Python notebooks was the primary motivation for making this template. The workflow used here depends on using pre-commit with a jupytext hook, along with ruff for linting and formatting the produced Python files.
Follow these steps for version controlling your Python notebooks in this repository:
-
Create Python notebooks in the
notebooks/directory (such asnotebooks/example.ipynb). -
pre-commit installInstall pre-commit hooks (if pre-commit is not installed, try
uv tool install pre-commitorpip install pre-commit) -
git add notebooks/example.ipynb && git commit -m 'adds new notebook'This command should fail, but generates a Python file using jupytext.
-
git add notebooks/example.ipynb notebooks/python/example.py && git commit -m 'adds new notebook'Add and commit new Python file and updated notebook, this time the command should succeed. Note you may want to double check the files after step 2 to ensure they were generated and formatted correctly.
-
Make changes to
notebooks/example.ipynband repeat from step 2 (with a new commit message!).
Though it should work the same, it is recommended not to edit Python scripts in notebooks/python/ directly. If you need to use Python scripts (for example, for writing shared functions), place these in the scripts/ directory
In order to run your Python notebooks, you will need some third-party libraries. uv is an excellent tool for managing these dependencies. For more details on installation and usage, see the uv docs. Note that the pyproject.toml is setup to require python>=3.12, so make sure to change that if you have a dependency that requires an older Python version. It also comes pre-installed with ipykernel, which is required for running Python notebooks.
-
uv syncInitialize a
.venvwith the dependencies frompyproject.tomlinstalled (from here it should be synced automatically). Make sure to select this virtual environment when running Python notebooks! -
uv pip install -e .The
scriptsdirectory is setup to be included as an editable package in this project, to install simply run from the root directory. You can then import from any modules in thescriptssubdirectory. -
uv add numpyFrom here, add additional dependencies as needed.
uvwill automatically update your virtual environment.
analysis-template/
├── data/ # Raw and processed data (consider using .gitignore for large files)
├── notebooks/ # Primary analysis Python notebooks
│ ├── functions/ # Python functions stored in .py files for use in notebooks
│ └── python/ # Python scripts generated by jupytext for version control
├── scripts/ # Independent Python scripts (not converted)
├── manuscript/ # Figures and files for manuscripts
├── presentation/ # Figures and files for presentations
├── misc/ # Miscellaneous files
├── pyproject.toml # Project specifications and dependency control (recommend using uv)
└── README.md # Project overview and instructions
Directories are initialized with a .gitkeep file so they can be committed to version control "empty". You can remove these files when other files are added to the directory or ignore them entirely.
If you think something could be improved, open an issue - I would love to discuss better ways to manage analysis code! Though do keep in mind that this repository was made primarily for my own use cases, so if we disagree, I might recommend cloning it and making changes in your local repository.
This template is licensed under the MIT License. Please give credit if you find this template useful :).