-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dockerize mara #7
base: master
Are you sure you want to change the base?
Conversation
…lation. Adjust docs
I had a few additions and problems :-( :
I ended up with this for the dev docker container (e.g. linking in the source into the container via a volume): # don't care about slim, we anyway have lots of reasons to poke into the container...
FROM python:3.7-stretch
RUN ["mkdir", "-p", "/mara"]
WORKDIR /mara
VOLUME /mara
RUN groupadd -r mara && useradd --no-log-init -r -g mara mara
# Install latest stable postgresql-client from the official repository
RUN \
apt-get update && apt-get install -y --no-install-recommends gnupg dirmngr \
# https://github.com/inversepath/usbarmory-debian-base_image/issues/9#issuecomment-466594168
&& mkdir ~/.gnupg && echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf \
&& apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys B97B0AFCAA1A47F044F244A07FCC7D46ACCC4CF8 \
&& echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main" > /etc/apt/sources.list.d/pgdg.list \
# ugly fix to install postgresql-client without errors in slim-stretch image
&& mkdir -p /usr/share/man/man1 /usr/share/man/man7 \
&& apt-get update && apt-get install -y --no-install-recommends \
git \
dialog \
coreutils \
graphviz \
python3-dev \
python3-venv \
rsync \
nano \
telnet \
postgresql-client
# The entrypoint installs all packages on first start...
COPY ./docker/dev/entrypoint.sh /
RUN ["chmod", "+x", "/entrypoint.sh"]
EXPOSE 5000
ENV MARA_ENVIRONMENT docker-dev
ENV FLASK_APP "/mara/app/app.py"
ENV FLASK_DEBUG 1
# preactivate the environment, so you can straight do stuff like run pipelines with docker exec
ENV PATH /mara/.venv-docker/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
USER mara
ENTRYPOINT ["/entrypoint.sh"]
CMD ["flask-no-reload"] entrypoint.sh: #!/usr/bin/env bash
set -x
set -e
# Create the virtual env for the first time, which is only used for docker
# This prevents problems when a user uses the same source repo for both working
# via docker and local
venv_dir=".venv-docker"
if [ ! -d "${venv_dir}" ]; then
mkdir -p .venv-docker
(cd "${venv_dir}" && python3 -m venv --copies .)
# add the project directory to path, so we might use that to edit source packages
echo $(pwd) > "$(echo ${venv_dir}/lib/*/site-packages)/mara-path.pth"
# install minimum set of required packages
# wheel needs to be early to be able to build wheels
# --ignore-installed: https://github.com/moby/moby/issues/12327
# "EnvironmentError: [Errno 39] Directory not empty: '/mara/.venv-docker/lib/python3.7/site-packages/~ip/_internal'"
"${venv_dir}/bin/python3" -m pip install --ignore-installed --upgrade pip wheel requests setuptools pipdeptree
# Workaround problems with un-vendored urllib3/requests in pip on ubuntu/debian
# This forces .venv/bin/pip to use the vendored versions of urllib3 from the installed requests version
# see https://stackoverflow.com/a/46970344/1380673
rm -vf "${venv_dir}/share/python-wheels/{requests,chardet,urllib3}-*.whl"
fi
source "${venv_dir}/bin/activate"
if [ "$1" = "flask-reload" ]; then
exec flask run --with-threads --host 0.0.0.0 --reload --eager-loading
elif [ "$1" = "flask-no-reload" ]; then
exec flask run --with-threads --host 0.0.0.0 --no-reload --eager-loading
elif [ "$1" = "migrate" ]; then
flask app.cli.ensure-etl-db
flask mara_db.migrate
elif [ "$1" = "update-packages" ] ; then
for package_dir in $(mkdir -p packages; cd packages; find . -maxdepth 1 -mindepth 1 -type d) ; do
# I've no clue, but this has to run twice to work: the first time it will fail, but the second time it succeeds
echo $(.scripts/mara-app/ensure-pushed.sh packages/${package_dir} > /dev/null 2>&1) > /dev/null 2>&1
.scripts/mara-app/ensure-pushed.sh packages/${package_dir}
done
"${venv_dir}/bin/python3" -m pip install --ignore-installed --requirement=requirements.txt.freeze --src=./packages --upgrade --exists-action=w
PYTHONWARNINGS="ignore" "${venv_dir}/bin/python3" -m pip install --requirement=requirements.txt --src=./packages --upgrade --exists-action=w
# copy newer script versions
rsync --archive --recursive --itemize-changes --delete packages/mara-app/.scripts/ .scripts/mara-app/
"${venv_dir}/bin/pipdeptree" --warn=fail
# write freeze file
# pkg-ressources is automatically added on ubuntu, but breaks the install.
# https://stackoverflow.com/a/40167445/1380673
"${venv_dir}/bin/python3" -m pip freeze | grep -v "pkg-resources" > requirements.txt.freeze
flask app.cli.ensure-etl-db
flask mara_db.migrate
elif [ "$1" = "install-packages" ] ; then
for package_dir in $(mkdir -p packages; cd packages; find . -maxdepth 1 -mindepth 1 -type d) ; do
# I've no clue, but this has to run twice to work: the first time it will fail, but the second time it succeeds
echo $(.scripts/mara-app/ensure-pushed.sh packages/${package_dir} > /dev/null 2>&1) > /dev/null 2>&1
.scripts/mara-app/ensure-pushed.sh packages/${package_dir}
done
"${venv_dir}/bin/python3" -m pip install --ignore-installed --requirement=requirements.txt.freeze --src=./packages --upgrade --exists-action=w
rsync --archive --recursive --itemize-changes --delete packages/mara-app/.scripts/ .scripts/mara-app/
flask app.cli.ensure-etl-db
flask mara_db.migrate
else
exec "$@"
fi I've also build soemthing for our production environment (e.g. code copied into the docker image): Dockerfile # use a slim image to get a smaller size
FROM python:3.7-slim-stretch
RUN ["mkdir", "-p", "/mara"]
WORKDIR /mara
RUN groupadd -r mara && useradd --no-log-init -r -g mara mara
COPY requirements.txt.freeze /mara/
RUN \
apt-get update && apt-get install -y --no-install-recommends gnupg dirmngr \
# https://github.com/inversepath/usbarmory-debian-base_image/issues/9#issuecomment-466594168
&& mkdir ~/.gnupg && echo "disable-ipv6" >> ~/.gnupg/dirmngr.conf \
&& apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys B97B0AFCAA1A47F044F244A07FCC7D46ACCC4CF8 \
&& echo "deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main" > /etc/apt/sources.list.d/pgdg.list \
# ugly fix to install postgresql-client without errors in slim-stretch image
&& mkdir -p /usr/share/man/man1 /usr/share/man/man7 \
&& apt-get update && apt-get install -y --no-install-recommends \
git \
curl \
dialog \
coreutils \
graphviz \
python3-dev \
python3-venv \
rsync \
postgresql-client \
gcc \
&& pip install --no-cache-dir -r requirements.txt.freeze \
&& apt-get purge -y --auto-remove git gnupg dirmngr gcc \
&& rm -rf /var/lib/apt/lists/* ;
COPY ./docker/prod/entrypoint.sh /
RUN ["chmod", "+x", "/entrypoint.sh"]
COPY ./docker/prod/local_setup.py /mara/app/local_setup.py
COPY ./app/ /mara/app/
EXPOSE 5000
ENV MARA_ENVIRONMENT docker-prod
ENV FLASK_APP="/mara/app/app.py"
USER mara
ENTRYPOINT ["/entrypoint.sh"]
CMD ["flask-no-reload"] entrypoint.sh #!/usr/bin/env bash
set -x
set -e
if [ "$1" = "flask-reload" ]; then
exec flask run --with-threads --host 0.0.0.0 --reload --eager-loading
elif [ "$1" = "flask-no-reload" ]; then
exec flask run --with-threads --host 0.0.0.0 --no-reload --eager-loading
else
exec "$@"
fi local_setup.py # ...
# On local/in-docker, this is filled with some values
defaults = {}
_sentinal = object()
def e(name, default=_sentinal):
ret = os.environ.get(name.upper(), defaults.get(name))
if ret is None:
if default is not _sentinal:
return default
raise KeyError(f"{name} not found in default or environment")
return ret
# Cached to not lookup the config with each time the DBs are looked up
__dwh_db = mara_db.dbs.PostgreSQLDB(user=e("dwh_user"),
host=e("dwh_host"),
port=int(e("dwh_port")),
database=e("dwh_database"),
password=e("dwh_password"))
__mara_db = mara_db.dbs.PostgreSQLDB(user=e("mara_user"),
host=e("mara_host"),
port=int(e("mara_port")),
database=e("mara_database"),
password=e("mara_password"))
@patch(mara_db.config.databases)
def databases():
return {
# the project requires two databases: 'mara' for the app itself, and 'dwh' for the etl
'dwh': __dwh_db,
'mara': __mara_db,
# ...
# Again cached...
__max_number_of_parallel_tasks = int(e("max_number_of_parallel_tasks", 11))
patch(data_integration.config.max_number_of_parallel_tasks)(lambda: __max_number_of_parallel_tasks)
# ... I've put the Dockerfile/entrypoint.sh into `./docker/{dev|prod}/ and it's now called like this: ### Docker
For local dev, it's assumed that you want to use the locally installed
postgresql instead of one in a docker
container (speed on real disk is better than in a virtualized server.
Wouldn't matter on Linux...).
To be able to use a locally installed postgresql make postgresql listen
to the interface the docker
container have access to:
` ``bash
λ docker-machine ip
192.168.99.100
λ ifconfig |grep 192.168.99
inet 192.168.99.1 netmask 0xffffff00 broadcast 192.168.99.255
# on windows use ipconfig and a manual search
# -> This means postgresql has to listen at the 192.168.99.1 address
` ``
This needs two changes to the postgresql config files:
* in `postgresql.conf`, you need to add `listen_addresses = 'localhost,192.168.99.1'`and
* in `pg_hba.conf`, you need to add `host all all 192.168.99.100/32 trust`
Afterwards system restart postgresql (this depends on the docker host (in virtualbox)
already being up and running. on mac, the postgresql server starts too early for the
vbox to be online, so I always have to restart postgresql).
To build the container:
` ``bash
# development version, where the source code is on a VOLUME and you run pipelines in the browser
docker build -t mara-app:dev -f ./docker/dev/Dockerfile .
# Production version, where the source code is copied into the container
docker build -t mara-app:prod -f ./docker/prod/Dockerfile .
` ``
Running a development container:
` ``bash
cp docker/.env.example development.env
# edit development.env -> see comment in the file
# If you use a local postgresql, you need to set the DWH_HOST and MARA_HOST to the ipaddress
# from above (e.g. 192.168.99.1)
docker run -i -t --rm --name mara-app --mount type=bind,source=$(pwd),target=/mara -p 5000:5000 --net=bridge --env-file=development.env mara-app:dev
# docker exec also has the python venv in path:
docker exec -ti mara-app flask data_integration.ui.run --path utils
` ``
Running the production container:
` ``bash
cp docker/.env.example production.env
# edit production.env -> see comment in the file
docker run -i -t --rm --name mara-app -p 5000:5000 --net=bridge --env-file=production.env mara-app:prod
# run the utils pipeline in the same container
docker exec -ti mara-app flask data_integration.ui.run --path utils
` `` |
Sizes:
Still Meh :-( |
… into dockerize-mara
@martin-loetzsch @jankatins who can resolve the conflict? |
@gathineou is anyone still working on this? |
I copied the Dockerfiles of this branch and build a docker environment myself which is quite similar to this one. In case there is interest, I could share this in a separate PoC repository. |
@leo-schick would be more than happy to try! |
@soobrosa I published now my simplified current version under https://github.com/mara/docker. The repository is currently private but you should be able to see it as member of the mara organisation. My docker images are based on this repo with some additional changes from my side. The suggestions from @jankatins are not considered (yet). PRs are welcome 🤟 p.s. did not do any changes to the |
Hi @jankatins 👋🏽 excuse my late reply, was off for a while and overlooked this. |
PR for discussing and brainstorming on the docker-related mara implementations