Skip to content

Commit e5d25a6

Browse files
authored
Merge branch 'main' into greg/map-old-sleap-config-files-to-new
2 parents 64b53fc + 5c3a38d commit e5d25a6

39 files changed

+2225
-1822
lines changed

.DS_Store

6 KB
Binary file not shown.

.dockerignore

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
README.md
2+
docs/
3+
*.egg-info/
4+
5+
6+
# Test artifacts
7+
tests/
8+
*.pytest_cache/
9+
10+
*.ruff_cache
11+
codecov.yml
12+
13+
# Git files
14+
.github/
15+
.gitignore

.github/workflows/ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737
- name: Set up Python
3838
uses: actions/setup-python@v4
3939
with:
40-
python-version: 3.9
40+
python-version: 3.11
4141

4242
- name: Install dependencies
4343
run: |
@@ -57,7 +57,7 @@ jobs:
5757
fail-fast: false
5858
matrix:
5959
os: ["ubuntu-latest", "windows-latest", "macos-14"]
60-
python: [3.9]
60+
python: [3.11]
6161
include:
6262
# Default values
6363
- env_file: environment_cpu.yml

Dockerfile

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
## Docker image for remote development
2+
3+
# Directly from a cuda built image.
4+
FROM nvidia/cuda:12.6.1-base-ubuntu24.04
5+
6+
LABEL maintainer="Divya Seshadri Murali <[email protected]>"
7+
8+
USER root
9+
10+
RUN apt-get update && apt-get install -y --no-install-recommends build-essential openssh-server
11+
12+
13+
# use tini instead of init: useful esp. when using multi-processing, ssh, zombie processes
14+
ENV TINI_VERSION v0.19.0
15+
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
16+
RUN chmod +x /tini
17+
# /tini -- python app.py
18+
ENTRYPOINT ["/tini", "--"]
19+
20+
RUN mkdir /var/run/sshd
21+
RUN echo 'root:root' | chpasswd
22+
RUN sed -i 's/#*PermitRootLogin prohibit-password/PermitRootLogin yes/g' /etc/ssh/sshd_config
23+
24+
# SSH login fix. Otherwise user is kicked off after login
25+
RUN sed -i 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' /etc/pam.d/sshd
26+
27+
# ENV NOTVISIBLE="in users profile"
28+
# RUN echo "export VISIBLE=now" >> /etc/profile
29+
30+
EXPOSE 22
31+
CMD ["/usr/sbin/sshd", "-D"]
32+
33+
34+
# Install all necessary packages and remove apt cache.
35+
RUN apt-get update && \
36+
apt-get install -y --no-install-recommends \
37+
build-essential \
38+
openssh-server \
39+
wget \
40+
curl \
41+
git \
42+
screen \
43+
ffmpeg && \
44+
rm -rf /var/lib/apt/lists/*
45+
46+
47+
# Install Miniforge
48+
RUN curl -fsSL --compressed https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -o "Miniforge3-Linux-x86_64.sh" && \
49+
chmod +x "Miniforge3-Linux-x86_64.sh" && \
50+
bash "Miniforge3-Linux-x86_64.sh" -b -p "/root/miniforge3" && \
51+
rm "Miniforge3-Linux-x86_64.sh" && \
52+
/root/miniforge3/bin/conda init bash && \
53+
/root/miniforge3/bin/conda clean --all -y
54+
55+
# Add conda to path to create new env
56+
ENV PATH "/root/miniforge3/bin:$PATH"
57+
58+
59+
# install conda env
60+
RUN mkdir sleap-nn/
61+
WORKDIR sleap-nn
62+
COPY . ./sleap-nn
63+
RUN mamba env create -f ./sleap-nn/environment.yml

docs/config.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,13 @@ The config file has three main sections:
1717
- `val_labels_path`: (str) Path to validation data (`.slp` file)
1818
- `test_file_path`: (str) Path to test dataset (`.slp` file or `.mp4` file). *Note*: This is used only with CLI to get evaluation on test set after training is completed.
1919
- `user_instances_only`: (bool) `True` if only user labeled instances should be used for training. If `False`, both user labeled and predicted instances would be used. *Default*: `True`.
20-
- `data_pipeline_fw`: (str) Framework to create the data loaders. One of [`litdata`, `torch_dataset`, `torch_dataset_np_chunks`].
20+
- `data_pipeline_fw`: (str) Framework to create the data loaders. One of [`litdata`, `torch_dataset`, `torch_dataset_cache_img_memory`, `torch_dataset_cache_img_disk`].
2121
*Default*: `"torch_dataset"`.
22-
- `np_chunks_path`: (str) Path to save `.npz` chunks created with `torch_dataset_np_chunks` data pipeline framework. If `None`, the path provided in `trainer_config.save_ckpt` is used (else working dir is used). The `train_chunks` and `val_chunks` dirs are created inside this path. *Default*: `None`.
22+
- `cache_img_path`: (str) Path to save `.jpg` images created with `torch_dataset_cache_img_disk` data pipeline framework. If `None`, the path provided in `trainer_config.save_ckpt` is used (else working dir is used). The `train_imgs` and `val_imgs` dirs are created inside this path. *Default*: `None`.
2323
- `litdata_chunks_path`: (str) Path to save `.bin` files created with `litdata` data pipeline framework. If `None`, the path provided in `trainer_config.save_ckpt` is used (else working dir is used). The `train_chunks` and `val_chunks` dirs are created inside this path. *Default*: `None`.
24-
- `use_existing_chunks`: (bool) Use existing train and val chunks in the `np_chunks_path` or `chunks_path` for `torch_dataset_np_chunks` or `litdata` frameworks. If `True`, the `np_chunks_path` (or `chunks_path`) should have `train_chunks` and `val_chunks` dirs. *Default*: `False`.
24+
- `use_existing_imgs`: (bool) Use existing train and val images/ chunks in the `cache_img_path` or `litdata_chunks_path` for `torch_dataset_cache_img_disk` or `litdata` frameworks. If `True`, the `cache_img_path` (or `litdata_chunks_path`) should have `train_imgs` and `val_imgs` dirs. *Default*: `False`.
2525
- `chunk_size`: (int) Size of each chunk (in MB). *Default*: `100`.
26-
- `delete_chunks_after_training`: (bool) If `False`, the chunks (numpy or litdata chunks) are retained after training. Else, the chunks are deleted. *Default*: `True`.
26+
- `delete_cache_imgs_after_training`: (bool) If `False`, the images (torch_dataset_cache_img_disk or litdata chunks) are retained after training. Else, the files are deleted. *Default*: `True`.
2727
#TODO: change in inference ckpts
2828
- `preprocessing`:
2929
- `is_rgb`: (bool) True if the image has 3 channels (RGB image). If input has only one
@@ -164,6 +164,8 @@ The config file has three main sections:
164164
- `save_last`: (bool) When True, saves a last.ckpt whenever a checkpoint file gets saved. On a local filesystem, this will be a symbolic link, and otherwise a copy of the checkpoint file. This allows accessing the latest checkpoint in a deterministic manner. *Default*: `False`.
165165
- `trainer_devices`: (int) Number of devices to train on (int), which devices to train on (list or str), or "auto" to select automatically. *Default*: `"auto"`.
166166
- `trainer_accelerator`: (str) One of the ("cpu", "gpu", "tpu", "ipu", "auto"). "auto" recognises the machine the model is running on and chooses the appropriate accelerator for the `Trainer` to be connected to. *Default*: `"auto"`.
167+
- `profiler`: (str) Profiler for pytorch Trainer. One of ["advanced", "passthrough", "pytorch", "simple"]. *Default*: `None`.
168+
- `trainer_strategy`: (str) Training strategy, one of ["auto", "ddp", "fsdp", "ddp_find_unused_parameters_false", "ddp_find_unused_parameters_true", ...]. This supports any training strategy that is supported by `lightning.Trainer`. *Default*: `"auto"`.
167169
- `enable_progress_bar`: (bool) When True, enables printing the logs during training.
168170
*Default*: `False`.
169171
- `steps_per_epoch`: (int) Minimum number of iterations in a single epoch. (Useful if model is trained with very few data points). Refer `limit_train_batches` parameter of Torch `Trainer`. If `None`, the number of iterations depends on the number of samples in the train dataset.

docs/config_bottomup.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,8 @@ trainer_config:
9393
save_last: true
9494
trainer_devices: 1
9595
trainer_accelerator: cpu
96+
profiler: "simple"
97+
trainer_strategy: auto
9698
enable_progress_bar: false
9799
steps_per_epoch: null
98100
max_epochs: 50

environment.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ channels:
66
- conda-forge
77

88
dependencies:
9-
- python=3.9
9+
- python=3.11
1010
- pytorch-cuda=11.8
1111
- numpy
12-
- sleap-io
12+
- sleap-io>=0.2.0
1313
- pydantic
1414
- lightning
1515
- cudnn

environment_cpu.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ channels:
55
- conda-forge
66

77
dependencies:
8-
- python=3.9
8+
- python=3.11
99
- numpy
10-
- sleap-io
10+
- sleap-io>=0.2.0
1111
- pytorch
1212
- pydantic
1313
- lightning

environment_mac.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,9 @@ channels:
55
- conda-forge
66

77
dependencies:
8-
- python=3.9
8+
- python=3.11
99
- numpy
10-
- sleap-io
10+
- sleap-io>=0.2.0
1111
- pydantic
1212
- lightning
1313
- pytorch

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@ authors = [
1212
{name = "Talmo Pereira", email = "[email protected]"}
1313
]
1414
description = "Neural network backend for training and inference for animal pose estimation."
15-
requires-python = ">=3.9"
15+
requires-python = ">=3.11"
1616
keywords = ["sleap", "pose estimation", "deep learning", "neural networks", "computer vision", "animal behavior"]
1717
license = {text = "BSD-3-Clause"}
1818
classifiers = [
19-
"Programming Language :: Python :: 3.9"
19+
"Programming Language :: Python :: 3.11"
2020
]
2121
dependencies = [
2222
"torch",

0 commit comments

Comments
 (0)