Skip to content

Conversation

@GondekNP
Copy link

This fork is essentially attempting to run the DeepBiosphere model top to bottom, but within a Dev Container for reproducibility and ease of development's sake!

As of now, these are the only changes:

  • Rather than run everything in the local env, I added most of the README getting started steps (system dependencies, python dependencies, directory structure, gbif auth) into dev.Dockerfile. Assuming that you have Docker installed locally alongside VSCode with the Dev Containers extension, you can run Dev Containers: Reopen in Container to build and run the development image and connect to it via SSH with VSCode. Alongside this, within the .devcontainer folder, there are a couple other files - (1) devcontainer.json which essentially helps VSCode connect to that built container and provides some customization options, and (2) environment.yml which is a conda environment file that the Docker build script uses to get the proper version of Python pre-installed, along with its packages (inplace-abn, shapely, pygeos, etc).

  • I've added VSCode launch configs to .vscode to step thru and test Download_GBIF_Data.py and Build_Data.py (as well as Inference.py but I haven't tried that one yet - related to Out-of-the-box inference without training? #5 ). Unfortunately the args to the CLI calls within the launch json are hardcoded, since I can't find any way to use vars in the actual launch json itself, but the structure is easily modifiable!

  • To cut down on the size of my GBIF call, I added an additional flag to the CLI call for Build_Data, which optionally passes a WKT geometry rather than an area string. This allows me to use a very small demo area within the Mojave for dev purposes. Alongside that, I changed a couple of var names for internal consistency, but no changes to functionality there.

  • Lastly, added a local .env within the repo itself to hold my local paths to relevant inputs / outputs. It looks like this:

GBIF_USER=npg
[email protected]

PATH_OCCS='/workspaces/devcontainer/data/occs/',
PATH_SHPFILES='/workspaces/devcontainer/data/shpfiles/',
PATH_MODELS='/workspaces/devcontainer/data/models/',
PATH_IMAGES='/workspaces/devcontainer/data/images/',
PATH_RASTERS='/workspaces/devcontainer/data/rasters/',
PATH_BASELINES='/workspaces/devcontainer/data/baselines/',
PATH_RESULTS='/workspaces/devcontainer/data/results/',
PATH_MISC='/workspaces/devcontainer/data/misc/',
PATH_DOCS='/workspaces/devcontainer/data/docs/',
PATH_SCRATCH='/workspaces/devcontainer/data/scratch/',
PATH_RUNS='/workspaces/devcontainer/data/runs/',
PATH_BLOB_ROOT='https://naipblobs.blob.core.windows.net/'

This gets called in my fork by Utils.py, such that my local development doesn't cause any merge conflicts one day!

@GondekNP
Copy link
Author

Also, if it's helpful, this branch (or another) could handle #2 as well :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant