Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to catch cases of incorrect datadir ownership #74

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

orangejulius
Copy link
Member

If the Elasticsearch datadir is not owned by the user specified in $DOCKER_USER, Elasticsearch will fail to start.

Since #55 the helper scripts attempt to remedy this situation if the pelias script is run as root (which is not recommended but is often done).

However, if the pelias script is not run as root and the permissions are incorrect, this situation cannot be automatically fixed.

This code attempts to detect that case and recommend the proper command to run (with sudo) and set proper directory permissions.

Connects #31
Connects #73

cmd/elastic.sh Outdated Show resolved Hide resolved
cmd/elastic.sh Outdated
function elastic_start(){
mkdir -p $DATA_DIR/elasticsearch
# attemp to set proper permissions if running as root
chown $DOCKER_USER $DATA_DIR/elasticsearch 2>/dev/null || true

# record the owner of the Elasticsearch directory
elasticsearch_owner_uid=$(stat --format '%u' $DATA_DIR/elasticsearch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not working for me...

stat --format '%u' /data/pelias/docker/projects/portland-metro/elasticsearch
stat: illegal option -- -
usage: stat [-FlLnqrsx] [-f format] [-t timefmt] [file ...]

I think you might need to use the g* version:

gstat --format '%u' /data/pelias/docker/projects/portland-metro/elasticsearch
501

There is already a process for handling GNU versions of utils under OSX in https://github.com/pelias/docker/blob/master/pelias#L10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch yet again. It turns out GNU coreutils and BSD coreutils do not have compatible flags here.

My .bashrc adds the homebrew coreutils directory to my $PATH. Definitely convenient, but we can't expect everyone to do that.

I've added an environment variable for the stat command with a similar pattern. It should really work this time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I still don't think this is working as expected...

I've never had a problem with permissions on my laptop and now I would get a failure when trying to start elasticsearch, which would also probably be the case for all other Mac users too?

I don't have anything special set up for my environment, I decided to avoid aliases etc in order to have a 'stock standard' Mac dev environment which was closest to what our users would be running.

➜  portland-metro git:(check-data-dir-permissions) ✗ pelias elastic start
user 1000 cannot access elasticsearch directory at /data/pelias/docker/projects/portland-metro
please run 'sudo chown 1000 /data/pelias/docker/projects/portland-metro/elasticsearch'
➜  portland-metro git:(check-data-dir-permissions) ✗ ls -lah /data/pelias/docker/projects/portland-metro
total 0
drwxr-xr-x  14 peter  staff   448B Mar  4 16:33 .
drwxr-xr-x   3 peter  staff    96B Feb  8 08:14 ..
drwxr-xr-x   2 peter  staff    64B Feb  8 08:18 blacklist
drwxr-xr-x   3 peter  staff    96B Feb  8 08:18 csv
drwxr-xr-x   3 peter  staff    96B Mar  4 16:33 elasticsearch
drwxr-xr-x   3 peter  staff    96B Mar  4 16:32 es-bak
drwxr-xr-x  20 peter  staff   640B Feb  8 08:41 interpolation
drwxr-xr-x   4 peter  staff   128B Feb  8 08:18 openaddresses
drwxr-xr-x   3 peter  staff    96B Feb  8 08:18 openstreetmap
drwxr-xr-x   4 peter  staff   128B Feb  8 08:23 placeholder
drwxr-xr-x   3 peter  staff    96B Feb  8 08:22 polylines
drwxr-xr-x   4 peter  staff   128B Feb  8 08:18 tiger
drwxr-xr-x  16 peter  staff   512B Feb  8 08:20 transit
drwxr-xr-x   5 peter  staff   160B Feb  8 08:19 whosonfirst
➜  portland-metro git:(check-data-dir-permissions) ✗ ls -lah /data/pelias/docker/projects/portland-metro/elasticsearch
total 0
drwxr-xr-x   3 peter  staff    96B Mar  4 16:33 .
drwxr-xr-x  14 peter  staff   448B Mar  4 16:33 ..
drwxr-xr-x   3 peter  staff    96B Mar  4 16:33 nodes
➜  portland-metro git:(check-data-dir-permissions) ✗ whoami
peter
➜  portland-metro git:(check-data-dir-permissions) ✗ groups
staff operator everyone localaccounts _appserverusr admin _appserveradm _lpadmin com.apple.sharepoint.group.1 _appstore _lpoperator _developer _analyticsusers com.apple.access_ftp com.apple.access_screensharing com.apple.access_ssh
➜  portland-metro git:(check-data-dir-permissions) ✗ gstat --format '%u' /data/pelias/docker/projects/portland-metro/elasticsearch
501

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One solution is to change the ID in https://github.com/pelias/docker/blob/master/projects/portland-metro/.env#L3 as per the readme.
This seems like an additional step for Mac users which will deter them from adopting Pelias, I'd really love it to 'just work first time'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems as though the chown command is failing due to a permissions error, the messages are being silenced:

➜  portland-metro git:(check-data-dir-permissions) ✗ chown 1000 /data/pelias/docker/projects/portland-metro/elasticsearch
chown: /data/pelias/docker/projects/portland-metro/elasticsearch: Operation not permitted
➜  portland-metro git:(check-data-dir-permissions) ✗ echo $?
1

@missinglink
Copy link
Member

missinglink commented Mar 7, 2019

Heya,

So backing up a bit, I'm trying to remember why we are doing it like this in the first place?

It seems like an elegant solution is to put this at the top of the pelias executable?

# Detect user and group ids for current user
export DOCKER_USER=$(id -u):$(id -g);

So the $DOCKER_USER would simply default to the IDs of the currently executing user.

In order to support sudo, it could be improved as such:

export DOCKER_USER=$(id -u ${SUDO_USER-${USER}}):$(id -g ${SUDO_USER-${USER}});

This would mean that users couldn't easily override the value with a custom ID/GID value, there are two options for that:

  • Detect if DOCKER_USER is already set, if so, use that - we would have to delete existing references to it in .env files
  • Add a new variable called DOCKER_USER_OVERRIDE.

The nice thing about the script above is, it's super simple, portable and leaves local file permissions set to the executing user by default.

The thing I don't like about the existing solution is that it assumes the UID is 1000 and then tries to force the filesystem to 1000, which makes it hard to delete and edit files on a local filesystem when running as a non-1000 user.

One final consideration is what will happen when someone uses docker-compose directly, bypassing the pelias executable. In this case it will 'just work' with overrides and I believe will pass an empty string otherwise.

edit: yep, it also emits a warning, which is nice WARNING: The DOCKER_USER variable is not set. Defaulting to a blank string.

@orangejulius
Copy link
Member Author

I think you're on the right track with reworking the permissions. The important criteria we want to meet, roughly in priority order, are:

  • Docker setup works out of the box across platforms
  • Containers do not run as root
  • Docker setup supports running pelias script as root, even if it is not advised
  • Users of the docker setup can override the UID/GID the containers run as if they desire

So I think we can achieve this by setting DOCKER_USER with the sudo-compatible script you described if the DOCKER_USER variable is unset. Then we can remove the definition of DOCKER_USER from all the project .env files, and they will all default to running as the user who ran the pelias script.

@missinglink
Copy link
Member

missinglink commented Mar 7, 2019

👍 sounds perfect, the only thing I worry about is existing projects with an .env file containing a DOCKER_USER line

Particularly for users who copied an existing project into a new directory and run OSX, in that case they would probably start experiencing an error, but this might be worth doing anyway since it would be correct for future users.

@jeremy-rutman
Copy link
Contributor

jeremy-rutman commented Mar 17, 2019

If data_dir is owned by curent user and group is users (and DOCKER_USER is not set) then pelias elastic start completes and docker-compose ps shows it 'up' and not restarting all the time. However pelias elastic wait returns

elasticsearch did not come up, check config

This is with docker 17.05.0-ce and docker-compose 1.23.2 on debian jessie 8.8.

pelias compose logs elasticsearch shows:

elasticsearch_1 | [2019-03-17T10:09:36,131][INFO ][o.e.x.m.MachineLearningTemplateRegistry] [zGq41Uw] successfully created .ml-state index template
elasticsearch_1 | [2019-03-17T10:09:36,179][INFO ][o.e.x.m.MachineLearningTemplateRegistry] [zGq41Uw] successfully created .ml-meta index template
elasticsearch_1 | [2019-03-17T10:09:36,232][INFO ][o.e.x.m.MachineLearningTemplateRegistry] [zGq41Uw] successfully created .ml-notifications index template
elasticsearch_1 | [2019-03-17T10:09:36,297][INFO ][o.e.x.m.MachineLearningTemplateRegistry] [zGq41Uw] successfully created .ml-anomalies- index template
elasticsearch_1 | [2019-03-17T10:09:36,512][INFO ][o.e.l.LicenseService ] [zGq41Uw] license [ee8c0dc9-35e1-4f42-89a6-83022c114cd0] mode [trial] - valid
elasticsearch_1 | [2019-03-17T10:10:51,362][INFO ][o.e.n.Node ] [] initializing ...
elasticsearch_1 | [2019-03-17T10:10:51,506][INFO ][o.e.e.NodeEnvironment ] [zGq41Uw] using [1] data paths, mounts [[/usr/share/elasticsearch/data (storage-box5:/data/pelias/pelias_data/elasticsearch)]], net usable_space [156.7tb], net total_space [199.9tb], spins? [possibly], types [nfs4]
elasticsearch_1 | [2019-03-17T10:10:51,506][INFO ][o.e.e.NodeEnvironment ] [zGq41Uw] heap size [494.9mb], compressed ordinary object pointers [true]
elasticsearch_1 | [2019-03-17T10:10:51,522][INFO ][o.e.n.Node ] node name [zGq41Uw] derived from node ID [zGq41UwDQCG7SvBDmnRAwA]; set [node.name] to override
elasticsearch_1 | [2019-03-17T10:10:51,522][INFO ][o.e.n.Node ] version[5.6.12], pid[1], build[cfe3d9f/2018-09-10T20:12:43.732Z], OS[Linux/3.16.0-4-amd64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_181/25.181-b13]
elasticsearch_1 | [2019-03-17T10:10:51,522][INFO ][o.e.n.Node ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.cgroups.hierarchy.override=/, -Xms512m, -Xmx512m, -Des.path.home=/usr/share/elasticsearch]
elasticsearch_1 | [2019-03-17T10:10:53,777][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [aggs-matrix-stats]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [ingest-common]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [lang-expression]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [lang-groovy]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [lang-mustache]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [lang-painless]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [parent-join]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [percolator]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [reindex]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [transport-netty3]
elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [transport-netty4]
elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [analysis-icu]
elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [ingest-geoip]
elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [ingest-user-agent]
elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [repository-s3]
elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [x-pack]
elasticsearch_1 | [2019-03-17T10:10:55,702][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/123] [Main.cc@128] controller (64 bit): Version 5.6.12 (Build 95c8fcf06f45dc) Copyright (c) 2018 Elasticsearch BV
elasticsearch_1 | [2019-03-17T10:10:55,746][INFO ][o.e.d.DiscoveryModule ] [zGq41Uw] using discovery type [single-node]
elasticsearch_1 | [2019-03-17T10:10:56,426][INFO ][o.e.n.Node ] initialized
elasticsearch_1 | [2019-03-17T10:10:56,427][INFO ][o.e.n.Node ] [zGq41Uw] starting ...
elasticsearch_1 | [2019-03-17T10:10:56,569][INFO ][o.e.t.TransportService ] [zGq41Uw] publish_address {x.y.z.a:9300}, bound_addresses {0.0.0.0:9300}
elasticsearch_1 | [2019-03-17T10:10:56,596][INFO ][o.e.c.s.ClusterService ] [zGq41Uw] new_master {zGq41Uw}{zGq41UwDQCG7SvBDmnRAwA}{CRHPLW5-R9Wmi04uFc8TGQ}{172.18.0.2}{172.18.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}, reason: single-node-start-initial-join
elasticsearch_1 | [2019-03-17T10:10:56,663][INFO ][o.e.h.n.Netty4HttpServerTransport] [zGq41Uw] publish_address {x.y.z.a:9200}, bound_addresses {0.0.0.0:9200}
elasticsearch_1 | [2019-03-17T10:10:56,663][INFO ][o.e.n.Node ] [zGq41Uw] started
elasticsearch_1 | [2019-03-17T10:10:56,771][INFO ][o.e.l.LicenseService ] [zGq41Uw] license [ee8c0dc9-35e1-4f42-89a6-83022c114cd0] mode [trial] - valid
elasticsearch_1 | [2019-03-17T10:10:56,772][INFO ][o.e.g.GatewayService ] [zGq41Uw] recovered [0] indices into cluster_state

@jeremy-rutman
Copy link
Contributor

changing to user 'deploy' ( uid 1000 ) and chown of data_dir to deploy (group users) , and adding 'deploy' to docker group (sudo gpasswd -a deploy docker) seems to have taken care of everything and pelias elastic wait now now gives the somewhat ambiguous 'waiting for es service to come up' and then exits. pelias elastic status returns the http status 200 which I imagine is good

If the Elasticsearch datadir is not owned by the user specified in
$DOCKER_USER, Elasticsearch will fail to start.

Since #55 the helper scripts
attempt to remedy this situation if the `pelias` script is run as root
(which is not recommended but is often done).

However, if the `pelias` script is not run as root and the permissions
are incorrect, this situation cannot be automatically fixed.

This code attempts to detect that case and recommend the proper command
to run (with sudo) and set proper directory permissions.

Connects #31
Connects #73
@orangejulius
Copy link
Member Author

orangejulius commented Sep 15, 2020

Okay, I've revived this PR with some minor changes.

While the earlier comment about autodetecting $DOCKER_USER is a great idea, I believe we should solve that separately.

With more time and clarity, here is the purpose of this PR: to detect (and fix) the case where a user has created the $DATA_DIR with root ownership (or really any ownership that doesn't line up with $DOCKER_USER, even though we try to advise against this in our documentation. As seen in #214 and others this will still occasionally happen :)

I've changed the logic so that the script will now try to detect and fix permission issues, running a chown with sudo if necessary.

@missinglink can you review this and let me know what you think?

Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to simplify the permissions as per my old comment, and I see you agree, so ignoring that for a second....

I'm happy to merge this, two considerations:

  • need to check it works on mac too (ie all the commands are equivalent)
  • this checks the ownership of the directory but doesn't consider group ownership, I understand that's more complicated to check but may be expected by some users?

@missinglink
Copy link
Member

I just tested this on my Mac where I had an existing project with DOCKER_USER=501 in my .env file and it worked fine.
When I changed this to DOCKER_USER=1000 (the default) I got the following error:

pelias elastic start
user 1000 cannot access elasticsearch directory at /data/pelias/docker/projects/portland-metro
please run 'sudo chown 1000 /data/pelias/docker/projects/portland-metro/elasticsearch'

@orangejulius
Copy link
Member Author

orangejulius commented Sep 15, 2020

Right, and just to clarify no matter how much we simplify the process of configuring/determining $DOCKER_USER, we will still need a PR like this that checks that all the permissions line up because:

  • users may create directories on their own with incorrect permissions
  • Docker itself will create directories owned by root when a bind mount is configured but the directory doesn't exist

I should be able to test it on my Mac but would love a double check :)

As for group ownership, I don't believe that matters for Pelias and our Docker setup, but some people may care about group ownership. As it stands this PR will run chown $DOCKER_USER $DATA_DIR which would automatically set the group owner if $DOCKER_USER contains one (the format of $DOCKER_USER can be just $UID or $UID:$GID and Docker will happily accept it).

We could either leave things like this, or filter out any group in $DOCKER_USER and only set the user permissions. I think this would potentially be less intrusive, but again it's hard to judge what the most commonly expected behavior would be.

@missinglink
Copy link
Member

missinglink commented Sep 15, 2020

If I switch back to master and use DOCKER_USER=1000 then elasticsearch seems to start up fine 🤷
So it seems weird that it would error in this case?

gid
uid=501(peter) gid=20(staff) groups=20(staff),5(operator),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh),400(com.apple.access_remote_ae)
cat .env

COMPOSE_PROJECT_NAME=pelias
DATA_DIR=/data/pelias/docker/projects/portland-metro
DOCKER_USER=1000
ls -lah /data/pelias/docker/projects/portland-metro/elasticsearch
total 0
drwxr-xr-x   3 peter  staff    96B Jul 30 15:54 .
drwxr-xr-x  14 peter  staff   448B Jul 29 16:53 ..
drwxrwxr-x   3 peter  staff    96B Jul 30 15:54 nodes

@orangejulius
Copy link
Member Author

Try deleting the elasticsearch directory completely and then re-creating it with root ownership. You'll find that pelias elastic start runs but Elasticsearch never comes up, with the same errors as shown in #214 (comment).

Alternatively, try running one of the importers, and I bet they will fail when they try to actually write to Elasticsearch.

@missinglink
Copy link
Member

It's late for me here, so I'm maybe not explaining well.

I'm 👍 for adding a check for the case where the local directory is not readable by the DOCKER_USER but I don't think this is a reliable method of determining that.

For instance, I can create the DATA_DIR as root and chmod it 777, this code would still error and tell me to change it to sudo chown 1000 /data/pelias/docker/projects/portland-metro/elasticsearch.

And my local user is 501 🤷 there's still a lot of edge cases that can go wrong.

@missinglink
Copy link
Member

Maybe time to revisit this, I was thinking we could test -w the directories inside the container to check that the DOCKER_USER has write permissions with something like this?

pelias compose run elasticsearch test -w /usr/share/elasticsearch/data

echo $?
0

@orangejulius
Copy link
Member Author

I'm actually hoping this won't be needed in practice any more. I suppose if people ran mkdir ./data (or however they make the data directory) with sudo or as root, they could get into a situation where the processes in the Docker containers don't have permission to write, but I hope that's rare.

Maybe it's wishful thinking, and in that case we can do something like you suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants