|
| 1 | +--- |
| 2 | +slug: "uv-for-scientists" |
| 3 | +title: "uv: A Guide for Scientists" |
| 4 | +date: "2025-07-17" |
| 5 | +tags: ["tech"] |
| 6 | +description: "uv is the package manager that has taken the Python community by storm. It's also a fantastic tool for supporting reproducible science." |
| 7 | +--- |
| 8 | + |
| 9 | +[uv](https://docs.astral.sh/uv/) has quickly become the de-facto package manager for Python development, and for good reason. Prior to uv, Python package management has been a headache and one of the weakpoints of the ecosystem. Now, developers [have been convinced](https://www.datacamp.com/tutorial/python-uv) by uv's stellar speed, package management, and general ease of use. But I want to talk about one use case - scripting - in which uv can be a game-changer for scientists, researchers, and science reproducibility more generally. |
| 10 | + |
| 11 | +## Scripting and Reproducibility |
| 12 | + |
| 13 | +Researchers might write a Python script for any number of reasons: downloading data, running a model, or just doing some quick analysis. The most useful of these scripts are often passed down through labs, with students being handed scripts written by their (perhaps no longer available) predecessors. Scripts are often added to, posted publicly, and passed between labs with little to no documentation. However, speaking from experience, getting these scripts to run can be a difficult task, made all the more difficult by a poor package management system. |
| 14 | + |
| 15 | +Consider a script that was written just last spring - one would imagine it is still working well! Within the script is the code snippet: |
| 16 | + |
| 17 | +```python showLineNumbers |
| 18 | +# divide.py |
| 19 | + |
| 20 | +import numpy as np |
| 21 | + |
| 22 | +denominator = np.arange(10) |
| 23 | + |
| 24 | +for denom in denominator: |
| 25 | + if denom == 0: |
| 26 | + value = np.infty |
| 27 | + else: |
| 28 | + value = 1 / denom |
| 29 | +``` |
| 30 | + |
| 31 | +This script cleverly avoids a division by zero warning by explicitly handling the case of `denominator == 0`. However, the script was written with `numpy=1.24` installed. When I am given the script to run, I naively run `pip install numpy` in my current environment, which installs `numpy=2.0`. Now, the script will fail with ``AttributeError: `np.infty` was removed in the NumPy 2.0 release. Use `np.inf` instead.`` |
| 32 | + |
| 33 | +Fortunately for me, this is a contrived example with a quick and easy fix and a helpful error message, but it is hopefully easy to imagine how bugs like this can get out of hand quickly. And, in any case, it still takes the most valuable resource for all researchers - time - to fix the script. Worst of all, it means the script is potentially not reproducible! |
| 34 | + |
| 35 | +## Scripting with uv |
| 36 | + |
| 37 | +Now, let's take advantage of one of my favorite features of uv: [running scripts with dependencies](https://docs.astral.sh/uv/guides/scripts/). We will start by running the following commands, declaring a specific version of `numpy` to use with the `divide.py` script: |
| 38 | + |
| 39 | +```bash |
| 40 | +uv add --script divide.py numpy=1.24 |
| 41 | +``` |
| 42 | + |
| 43 | +Now, uv has modified our `divide.py` script to include the [PEP 723](https://peps.python.org/pep-0723/) metadata at the top of the file: |
| 44 | + |
| 45 | +```python showLineNumbers |
| 46 | +# /// script |
| 47 | +# requires-python = "<3.11" |
| 48 | +# dependencies = [ |
| 49 | +# "numpy==1.24", |
| 50 | +# ] |
| 51 | +# /// |
| 52 | + |
| 53 | +# divide.py |
| 54 | +``` |
| 55 | + |
| 56 | +Now, running the script can be done with uv as: |
| 57 | + |
| 58 | +```bash |
| 59 | +uv run divide.py |
| 60 | +``` |
| 61 | + |
| 62 | +which runs without error! Behind the scenes, uv is creating a venv for this specific script and keeping it updated with any changes to dependencies or Python versions. Since uv is **really** fast, you won't even notice. |
| 63 | + |
| 64 | +The **key takeaway** here is that the only thing researchers will have to do, once this metadata is added, is *share their scripts as normal*. Then, anyone with uv installed can run the script and uv will take care of the package management and environment setup for this script. |
| 65 | + |
| 66 | +We can take this one step further by adding an `exclude-newer` field in the script metadata to ensure that dependencies must come from the day of the script's creation. This is really useful since it means you often don't even have to specify the version of the dependency. Now, the entire script looks like this: |
| 67 | + |
| 68 | +```python showLineNumbers |
| 69 | +# /// script |
| 70 | +# requires-python = "<3.11" |
| 71 | +# dependencies = [ |
| 72 | +# "numpy==1.24", |
| 73 | +# ] |
| 74 | +# [tool.uv] |
| 75 | +# exclude-newer = "2024-05-01T00:00:00Z" |
| 76 | +# /// |
| 77 | +# divide.py |
| 78 | + |
| 79 | +import numpy as np |
| 80 | + |
| 81 | +denominator = np.arange(10) |
| 82 | + |
| 83 | +for denom in denominator: |
| 84 | + if denom == 0: |
| 85 | + value = np.infty |
| 86 | + else: |
| 87 | + value = 1 / denom |
| 88 | +``` |
| 89 | + |
| 90 | +Simply including this inline metadata at the top of scripts will ensure that this script will run on any machine that has uv installed. Indeed, as uv is just a tool for managing this metadata, the script will stay running even with whatever PEP-723 compliant tool comes next! |
| 91 | + |
| 92 | +If you want a real-world example, consider checking out my Ocean Observatories Initiative [nitrate data download script](https://github.com/andrew-s28/ooi-profiler-nitrate-retriever/blob/main/scripts/retrieve_profiler_data.py) or my [velocity, nitrate, and wind analysis scripts](https://github.com/andrew-s28/shelf-nitrate-response-to-upwelling/tree/main/scripts), both of which support my research on the [shelf nitrate response to upwelling](http://localhost:3000/projects/posts/shelf-nitrate-response-to-upwelling). |
0 commit comments