The Open PHACTS Discovery Platform can be installed as a series of Docker containers.
- Overview
- Requirements
- Docker installation
- Retrieving Open PHACTS Docker Images
- Building data containers
- Configuring Open PHACTS platform
- Running the Open PHACTS platform
- Stopping the Open PHACTS platform
- Removing the Open PHACTS platform
- Upgrading the Open PHACTS platform
A Docker container is a kind of sandboxed Linux environment, which typically runs a single server instance, e.g. mySQL. Each Container has its own virtual filesystem, which is realized from Docker images, downloaded from the central Docker Hub Registry.
The Open PHACTS Docker images provide the different services that form the Open PHACTS platform. This page describes how these Docker containers can be installed and started using Docker Compose.
The Open PHACTS containers will download and use the latest Open PHACTS data release, and provide the Virtuoso SPARQL endpoint, the Open PHACTS REST API and the Explorer web interface.
External services: The following components of the Open PHACTS platform is not yet included in this release.
- Chemical Resolution Service APIs (e.g. SMILEStoCSID and Similarity search)
- Text to Concept search calls
You can modify docker-compose.yml
to enable usage of the public APIs for these,
see the External services section below.
Roughly minimal hardware requirements:
- ~ 150 GB of disk space
- ~ 10 GB of RAM
- ~ 4 CPU core
Recommended hardware:
- ~ 250 GB of SSD disk
- ~ 128 GB of RAM
- ~ 8 CPU cores
Prerequisites:
- Recent x64 Linux distribution (e.g. Ubuntu 14.04 LTS, Centos 7)
- Docker 1.7.1 or later
- Docker Compose 1.5.2 or later
- Fast Internet connection (during build of data containers)
Note that the you would have to make the disk space available for Docker.
These Docker images have been tested on:
- Centos 6.7 (with kernel 3.18.21-17.el6 -
yum install centos-release-xen ; yum update
) - Ubuntu 14.04 LTS
These images have not been tested with Docker virtualization on non-Linux platforms (OS X, Windows) or behind a firewall.
See the Docker installation guide for details for your Linux distribution. Here's the short-hand installation for Ubuntu 14.04:
sudo -i
apt-get -y dist-upgrade
wget -qO- https://get.docker.com/ | sh
To test the installation, try:
sudo docker run hello-world
You will additionally need to install Docker Compose. The exact version used below might be out of date, see the install guide for details.
sudo -i
curl -L https://github.com/docker/compose/releases/download/1.6.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
To test the installation, try:
sudo docker-compose --version
Hint: If you add your username to the docker
group, as suggested by the
Docker install, and log out and in again, you can run the remaining docker
and docker-compose
commands without using sudo
. Note that this would
effectively be giving that user privileged root
access to the host machine
without password verification.
You will need about 150 GB of disk space for the Open PHACTS Docker containers and data. Check on the docker host:
sudo df -h /var/lib/docker/
If you do not have enough space on the right permission, you might want
to edit the -volumes
sections in docker-compose-override.yml
to use alternative
folders for the most disk hungry containers. Note that you still need
about 15 GB of disk space in /var/lib/docker
for the
downloaded Docker images.
Another simpler option is to do the equivalent of:
sudo service docker stop
sudo mv /var/lib/docker /bigdisk/
sudo ln -s /bigdisk/docker /var/lib/
sudo service docker start
If you are using a virtual machine to run Docker (e.g. on Windows and OS X) ensure you have allocated enough disk space (and memory) to the virtual machine's file system.
It is not recommended to use an externally mounted volume (e.g. USB, NFS or network share) for the Docker disk space.
Download this ops-platform-setup
repository from the master
branch:
curl -L https://github.com/openphacts/ops-docker/archive/master.tar.gz | tar xzv
cd ops-docker-master
You can also use the above to upgrade the ops-docker
download, but this would
overwrite any changes you have made to docker-compose.yml
. Therefore
you should put your edits in docker-compose.override.yml
instead.
Now make sure you are in the equivalent of the
ops-docker-master
folder and run:
sudo docker-compose pull
This will download the latest version of these Docker images according to docker-compose.yml:
- openphacts/explorer2
- openphacts/ops-linkeddataapi
- openphacts/identitymappingservice
- mysql
- memcached
- stain/virtuoso
The Open PHACTS Docker container use separate Data Volume Containers to contain the Open PHACTS datasets.
On installation you will need to run once these local data containers and their data staging counterpaths, virtuosodata-frombackup and mysqlstaging, which will download the Open PHACTS 1.5 data.
Make sure you have sufficient disk space available for Docker.
The below will download about 20 GB and might take some
time to download and stage
( 2h depending on network and disk speed).
sudo docker-compose up --no-recreate -d mysqlstaging virtuosostaging
To follow the progress, use:
sudo docker-compose ps
sudo docker-compose logs mysqlstaging
sudo docker-compose logs virtuosostaging
Note that docker-compose logs
may not terminate even if its contanier does,
use Ctrl-C to cancel log listing.
Expected output from mysqlstaging
:
mysqlstaging_1 | Preparing to stage ims
mysqlstaging_1 | Waiting for mySQL
mysqlstaging_1 | mySQL staging
mysqlstaging_1 | -rw-r--r-- 1 root root 1.2G Jul 22 15:43 /tmp/staging.sql
mysqlstaging_1 | out: 8737.224ms at 937.6kB/s ( 937.6kB/s avg) 8.0MB
mysqlstaging_1 | out: 1009.888ms at 7.9MB/s ( 1.6MB/s avg) 16.0MB
..
mysqlstaging_1 | out: 677.916ms at 11.8MB/s ( 8.6MB/s avg) 1.1GB
mysqlstaging_1 | out: 761.496ms at 10.5MB/s ( 8.6MB/s avg) 1.1GB
(long wait)
mysqlstaging_1 | mySQL staging finished
docker_mysqlstaging_1 exited with code 0
Expected output from virtuosostaging
:
virtuosostaging_1 | Downloading checksums from http://data.openphacts.org/dev/1.5/virtuoso/
virtuosostaging_1 | 2015-07-22 15:43:29 URL:http://data.openphacts.org/dev/1.5/virtuoso/ [1634/1634] -> "index.html" [1]
virtuosostaging_1 | 2015-07-22 15:43:29 URL:http://data.openphacts.org/dev/1.5/virtuoso/?C=N;O=D [1634/1634] -> "index.html?C=N;O=D" [1]
..
virtuosostaging_1 | Downloaded: 29 files, 1.5M in 0.1s (10.2 MB/s)
virtuosostaging_1 | Downloading Virtuoso backup set to /download
virtuosostaging_1 | Initializing download: http://data.openphacts.org/dev/1.5/virtuoso/ghard-dump-20150415.tar
virtuosostaging_1 | File size: 21715220480 bytes
virtuosostaging_1 | Opening output file ghard-dump-20150415.tar
virtuosostaging_1 | Starting download
(long wait)
virtuosostaging_1 | ghard-dump-20150415/bak_325.bp
virtuosostaging_1 | Data download complete
virtuosostaging_1 | Loading bak_ -- 677 files
(..)
virtuosostaging_1 | 08:46:24 OpenLink Virtuoso Universal Server
virtuosostaging_1 | 08:46:24 Version 07.20.3212-pthreads for Linux as of Jun 3 2015
virtuosostaging_1 | 08:46:24 uses parts of OpenSSL, PCRE, Html Tidy
virtuosostaging_1 | 08:46:24 Begin to restore with file prefix bak_
virtuosostaging_1 | 08:46:24 --> Backup file # 1 [0x3F02-0x74-0x8A]
virtuosostaging_1 | 08:46:25 --> Backup file # 2 [0x3F02-0x74-0x8A]
(..)
virtuosostaging_1 | 09:13:35 --> Backup file # 675 [0x3F02-0x74-0x8A]
virtuosostaging_1 | 09:13:36 --> Backup file # 676 [0x3F02-0x74-0x8A]
virtuosostaging_1 | 09:13:37 End of restoring from backup, 6751701 pages
virtuosostaging_1 | 09:13:37 Server exiting
virtuosostaging_1 | Loading completed
docker_virtuosostaging_1 exited with code 0
You may want to inspect the download progress:
stain@heater:~/ops-platform-setup/docker$ docker exec docker_virtuosostaging_1 du -hs /download
5.5G /download
Staging is finished when both mysqlstaging
and
virtuosostaging
have exited. Note that the mysql
container
will remain up. Check with:
sudo docker-compose ps
Edit the docker-compose.yml
file for your host-specific settings.
This is a Docker Compose configuration file.
You can modify -volumes
to use an explicit folder for the data containers,
e.g. to use a faster/bigger disk partition. See comments in-line in
docker-compose.yml
.
You may want to change the exposed -port
from 300*
to different ports,
or avoid their exposure at all. The only requirement here is that the exposed
port for api
must correspond to the port in API_URL
, and that the ports
are not already in use on the host server.
Unless you are going to access the platform on localhost
exclusively,
you must change the API_URL
variable for the
explorer2
container. This URL must use the fully qualified hostname
as it will be accessed in the browser. The port should remain
as 3002
unless you have changed the export port for api
.
Important: Do not include the trailing /
of the API_URL
.
For example:
environment:
- API_URL=http://server13.example.com:3002
TODO: Make a wrapping webserver that provides a common port 80 for api, sparql and api.
The APIs for the Chemical Resolution Service and ConceptWiki are not currently available as Docker images. The default configuration for the Open PHACTS Docker platform is to not access these APIs.
The APIs call below rely on these services and would therefore not normally be functional in this Docker installation:
/structure?inchi={inchi}
/structure?inchi_key={inchi_key}
/structure?smiles={smiles}
/structure/similarity?searchOptions.Molecule={searchOptions.Molecule}
/structure/substructure?searchOptions.Molecule={searchOptions.Molecule}
/structure/exact?searchOptions.Molecule={searchOptions.Molecule}
/search/freetext?q={q}
/search/byTag?q={q}&uuid={uuid}
/getConceptDescription?uuid={uuid}
To enable usage of the public APIs as a fallback for these calls,
modify docker-compose.yml
to uncomment these lines (keep the indendation):
environment:
- CRS=https://ops.rsc.org/api/v1/
- CONCEPTWIKI=http://www.conceptwiki.org/web-ws/concept
Usage of the public Open PHACTS API is covered by the Terms of Use and Privacy Policy.
Assuming the previous loading has completed, you can now start the rest of the Open PHACTS platform:
sudo docker-compose up --no-recreate -d
You can follow the progress by looking at the logs (press Ctrl-C to stop watching):
sudo docker-compose logs
The Open PHACTS platform should be started when you see the equivalent of these from each container:
api_1 | [Tue Jun 16 16:49:14.309976 2015] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.10 (Debian) PHP/5.6.10 configured -- resuming normal operations
mysql_1 | 2015-06-16 16:48:47 1 [Note] mysqld: ready for connections.
explorer2_1 | [2015-06-16 16:49:35] INFO WEBrick::HTTPServer#start: pid=1 port=3000
ims_1 | 16-Jun-2015 16:49:06.641 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 5568 ms
The virtuoso
container usually takes the longest to start up.
Once started, this should expose the following services (replace localhost
with your
server's hostname):
- http://localhost:3001/ - Open PHACTS Explorer Web UI
- http://localhost:3002/ - Open PHACTS REST API
- http://localhost:3003/sparql - Virtuoso SPARQL
- http://localhost:3004/QueryExpander/ - Open PHACTS IdentityMappingService (IMS)
Note: using the text search in Explorer will use the remote Text-to-Concept service from conceptiwki.org.
To check the status of the Open PHACTS platform, use:
sudo docker-compose ps
To stop the platform, use:
sudo docker-compose stop
sudo docker-compose stop
sudo docker-compose rm -v
To recover additional disk space by the docker images, and don't have any other non-running docker images you want to keep:
sudo docker images -q | xargs sudo docker rmi
Sometimes you might also need to remove all old containers - which would free up the images for the above:
sudo docker ps -aq | xargs sudo docker rm -v
Unless a new data release needs to be loaded, you do not need to repeat the staging. To upgrade the software within the docker images (e.g. newer mySQL or OPS Platform API), do:
sudo docker-compose pull
Then rebuild the containers to use the newer images:
sudo docker-compose up -d
If you need to restart staging from blank, then first remove their data volumes:
sudo docker-compose rm -v mysqldata virtuosodata
Then follow the procedure "Building data containers" above.