Description
Academic research depends on a software ecosystem of ever-increasing complexity. Moreover, each researcher's software environment is unique -- make use of different tools, different libraries, and different versions. These details are rarely fully described even for the researchers themselves. This poses a substantial barrier to reproducibility.
Docker provides a 'shipping container' to easily share your software environment with others. Unlike existing solutions, Docker isn't monolithic -- use the parts you like. This has made it very successful in the world of professional software developers because they, like researchers, have developed their own favorite tools and ways of doing things and don't want to change, but still need an easy way for others to run their software.
This tutorial would introduce Docker by illustrating 4 key concepts desirable in any approach to reproducible software environments:
- A flexible approach: We don't want to make any assumptions about a user's preferred OS, text editor, etc. (Docker runtime)
- An extensible approach: A user should be able to extend & repackage the environment with any of their favorite tools with minimal learning curve. (Docker containers)
- A community approach: Common extensions of tools & combinations should be developed & maintained as a community base environment. This saves time and permits optimization without restricting flexibility of individual users. (Docker Hub)
- A DevOps approach: Uses scripts instead of manuals to install. These are human-readable, machine-readable, extensible, portable, & easily versioned. (Dockerfiles)
This would be a hands-on demo of running a 'Dockerized' environment, extending it, committing & sharing those changes. (We probably do this using RStudio, though I could also demonstrate this for ipython-notebooks or other computational environments).