Skip to content

Commit 6297f0f

Browse files
authored
Merge pull request #164 from csdms/egp/add-envs-lesson
Add intro lesson on package managers
2 parents ae6fe13 + 8008b88 commit 6297f0f

File tree

1 file changed

+174
-0
lines changed

1 file changed

+174
-0
lines changed

lessons/conda/environments.ipynb

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Python environments and package management\n",
8+
"\n",
9+
"Python was first released in 1991. Our current version, Python 3.0, was first released in 2008. This means we have 30+ years of development to lean on, including a robust ecosystem of packages for scientific computing. But, not all packages play nicely with one another. In particular, packages have *dependencies*, which are *other packages* they rely on (and those probably also have dependencies, and so forth). When we want to update something or install something new, what happens if that package brings dependencies that conflict with our other packages? What if I have some code that works, with a certain set of packages, but I want to share that with my advisor, who doesn't have everything installed already? What if I return to some software I wrote a year ago, but I've updated/installed/uninstalled packages since then? \n",
10+
"\n",
11+
"This is why we need tools for *package management*.\n",
12+
"\n",
13+
"Objectives:\n",
14+
"In this lesson, you'll learn...\n",
15+
"- What a package manager is, and why it's useful,\n",
16+
"- How to organize Python packages into virtual environments, and\n",
17+
"- How to set up a new environment using *conda*, *venv*, or *poetry*."
18+
]
19+
},
20+
{
21+
"cell_type": "markdown",
22+
"metadata": {},
23+
"source": [
24+
"# What do we need in a package manager?\n",
25+
"### 1. Python environments\n",
26+
"An environment is an isolated group of packages and dependencies **that all work together**. We can define environments in plain text files, just by delineating the exact version of Python and each package we want to use. Environments are ephemeral. Create them, use them, break them, throw them out, and then create them again. (As long as you have the text file with the environment specification.)\n",
27+
"\n",
28+
"### 2. Package installation\n",
29+
"We could install all of our packages by hand, by going to each repository individually, picking a version, and downloading that software. But, why do all of the hard work ourselves? A good package manager will let us specify constraints that we know (e.g., I want *numpy* 2.X), but will make reasonable choices in the absence of any specific requirement. It should go get all of the necessary dependencies for our packages. And, it will resolve conflicts that arise between different versions of packages - or at the very least, flag those conflicts and ask us to weigh in and resolve things. \n",
30+
"\n",
31+
"### 3. Reproducibility\n",
32+
"Our environments are ephemeral, so our package manager needs to be able to save a specification file, share it with others, and make new environments from files that others give us. Each one will be opinionated about how the specification file should be formatted, but the underlying functionality is the same. In general, **reproducibility** requires our package manager to behave in expected, predictable ways. "
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"metadata": {},
38+
"source": [
39+
"# Conda"
40+
]
41+
},
42+
{
43+
"cell_type": "markdown",
44+
"metadata": {},
45+
"source": [
46+
"### Downloading conda\n",
47+
"Ironically, there are lots of different versions, or *distributions*, of the conda software. If you are going to use conda, **we highly recommend the [miniforge distribution](https://github.com/conda-forge/miniforge)**. It's about as minimal as a conda distribution can get, and it connects by default to a *channel*, called **conda-forge**, which has most of the packages you'll ever need for scientific computing.\n",
48+
"\n",
49+
"Note: *mamba* is a faster version of conda, written in C++. They are used in exactly the same way (99% of the time), so pick whichever one you like. \n",
50+
"\n",
51+
"### Making a new conda environment\n",
52+
"If conda is not enabled in your shell, you may need to run:\n",
53+
"```\n",
54+
"conda init\n",
55+
"```\n",
56+
"Once you have conda installed, you can make a new environment with:\n",
57+
"```\n",
58+
"conda create --name my_project\n",
59+
"```\n",
60+
"Now, run\n",
61+
"```\n",
62+
"conda env list\n",
63+
"```\n",
64+
"to see all of your environments. It should return a list with (base) and (my_project) or whatever you called your new environment. Before doing anything else, let's *activate* our new environment, using:\n",
65+
"```\n",
66+
"conda activate my_project\n",
67+
"```\n",
68+
"Now, you should see (my_project) at the left side of your shell prompt, before the $. This indicates that our new conda environment is activated and ready to go.\n",
69+
"\n",
70+
"**Important note: do not install packages into base.**\n",
71+
"\n",
72+
"**Seriously, don't do it.**\n",
73+
"\n",
74+
"### Installing packages\n",
75+
"To install a new package, let's say *numpy*, run:\n",
76+
"```\n",
77+
"conda install numpy\n",
78+
"```\n",
79+
"and then follow the prompts. Conda will attempt to *solve* your dependency tree, which is a fancy way of saying that it will make sure there aren't any conflicts. If you want a specific version, you can instead run:\n",
80+
"```\n",
81+
"conda install numpy=2.0\n",
82+
"```\n",
83+
"for exactly numpy 2.0, or:\n",
84+
"```\n",
85+
"conda install numpy>=2.0\n",
86+
"```\n",
87+
"for *at least* numpy 2.0, if you're okay with more recent versions. You can combine =, <, > to specify different generations of packages.\n",
88+
"\n",
89+
"To list all of the packages in your current environment, run:\n",
90+
"```\n",
91+
"conda list\n",
92+
"```\n",
93+
"And to update packages, run:\n",
94+
"```\n",
95+
"conda update foo\n",
96+
"```\n",
97+
"where *foo* can be a package name, or even *conda* or *python*.\n",
98+
"\n",
99+
"### Using environment files\n",
100+
"The last thing we need is a way to save, share, and rebuild our environments. Conda uses YAML files to specify environments. To export your current environment to a YAML file, you can use:\n",
101+
"```\n",
102+
"conda env export > environment.yaml\n",
103+
"```\n",
104+
"which basically just exports a list of the current packages and versions in your environment to a text file. Making a new environment from a file is very similar to what we did above, with one addition:\n",
105+
"```\n",
106+
"conda env create --file environment.yaml\n",
107+
"```\n",
108+
"By default, this will make a new environment named \"environment.\" But you can also use the --name flag to name it whatever you want."
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"# Venv"
116+
]
117+
},
118+
{
119+
"cell_type": "markdown",
120+
"metadata": {},
121+
"source": [
122+
"We can also make ourselves a DIY conda in base Python using a combination of *venv* and *pip*. To make a new virtual environment in python, run:\n",
123+
"```\n",
124+
"python3 -m venv [FOLDER]\n",
125+
"```\n",
126+
"where [FOLDER] will be the directory where your environment is installed. Note that this should be a sub-directory of your project directory, if you are following the convention of having one environment for one project. It's really common to name this folder .venv so it is hidden by default and doesn't conflict with other directories. Now, when you run\n",
127+
"```\n",
128+
"source .venv/bin/activate\n",
129+
"```\n",
130+
"an indicator should appear at the left of your shell prompt with (.venv) (or whatever the name of your venv is). \n",
131+
"\n",
132+
"The default installation tool in Python is *pip*. We want to be careful with Python and pip, because your operating system probably has a default Python installation. If we call pip from the wrong location, we might accidentally install packages into your system Python directory (which is annoying but not catastrophic). Check this now with:\n",
133+
"```\n",
134+
"which pip\n",
135+
"```\n",
136+
"You should see a path to the pip installed in your venv directory. If not, make sure to activate the venv. Next, we can install packages using:\n",
137+
"```\n",
138+
"pip install numpy\n",
139+
"```\n",
140+
"or whatever other package we want. Similar to conda, we can use numpy==2.0 or >, <, >=, and <=. Notice that the exact version is denoted with '==' instead of '='. \n",
141+
"\n",
142+
"Pip also gives us an easy way to save environment files. Try:\n",
143+
"```\n",
144+
"pip freeze > requirements.txt\n",
145+
"```\n",
146+
"Requirements files are basically the same as environment.yaml files, with slightly different formatting. You can also install environments from a requirements file using:\n",
147+
"```\n",
148+
"pip install -r requirements.txt\n",
149+
"```"
150+
]
151+
},
152+
{
153+
"cell_type": "markdown",
154+
"metadata": {},
155+
"source": [
156+
"# Poetry"
157+
]
158+
},
159+
{
160+
"cell_type": "markdown",
161+
"metadata": {},
162+
"source": [
163+
"Right now, my favorite package manager is [poetry](https://python-poetry.org/docs/basic-usage/). It does the same things as conda or venv for Python packages, but also has some advanced functionality built in."
164+
]
165+
}
166+
],
167+
"metadata": {
168+
"language_info": {
169+
"name": "python"
170+
}
171+
},
172+
"nbformat": 4,
173+
"nbformat_minor": 2
174+
}

0 commit comments

Comments
 (0)