since approx 2018-06-05, in-docker-container ansible-container build fails with "ansible.errors.AnsibleError: the role '<rolename>' was not found in <rolespath>" on different roles depending on environment #942
Description
ISSUE TYPE
- Bug Report
container.yml
This is a reasonably small example I created to demonstrate the problem. (Yes it fails.)
version: '2'
settings:
project_name: buildbox
conductor:
base: 'centos:7'
services:
base:
from: centos:7
roles:
- BuildBox/Base
- BuildBox/Configuration1
- BuildBox/Configuration2
- BuildBox/Configuration3
- BuildBox/Configuration4
working_dir: /tmp
ports:
- '22'
command:
- /usr/sbin/sshd
- -D
Individual roles have a tasks/main.yml of the form
---
- command: echo BASE
substitute BASE for ONE, TWO, THREE, FOUR to match role
OS / ENVIRONMENT
The environment for a virtualenv ansible-container install direct on ubuntu xenial:
Ansible Container, version 0.9.2
Linux, dhsueh-ubuntu, 4.13.0-43-generic, #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018, x86_64
2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] <virtualenv directory path>/bin/python2
Believed-identical environment configured as a Dockerfile-built docker container "FROM ubuntu:xenial":
Ansible Container, version 0.9.2
Linux, b92df59f4255, 4.13.0-43-generic, #48~16.04.1-Ubuntu SMP Thu May 17 12:56:46 UTC 2018, x86_64
2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] /usr/bin/python
(I have tried a "FROM centos:7" version as well - no difference.)
My environments are set up pinned to 0.9.2 with various workarounds applied as I encountered the need for them (ubuntu paths below):
pip --disable-pip-version-check install pip==9.0.3
pip --disable-pip-version-check install setuptools==39.2.0
pip --disable-pip-version-check install docker==2.7.0
pip --disable-pip-version-check install ansible-container[docker]==0.9.2
sed -i "s/filters={'name': self.secrets_volume_name}//g" /usr/local/lib/python2.7/dist-packages/container/docker/secrets.py
sed -i "s/return os.path.join(os.sep, 'run', 'secrets')/return os.path.join(os.sep, 'docker', 'secrets')/g" /usr/local/lib/python2.7/dist-packages/container/docker/engine.py
pip docker==2.7.0 is workaround that I can't find a reference for now (?!?!)
sed filters workaround addresses ansible-container bug described in moby/moby#34121
sed return is workaround for #762
SUMMARY
Heads up: The observed behavior is strikingly similar to #673 but does not involve any cloud-enabled roles; all roles requested confirmed to exist on the filesystem in the single path specified in --roles-path option.
I have many services, each with many different roles listed. Previous to 2018-06-05 everything was working fine on a particular docker host. On 2018-06-05 I added an extra role to my services. at the end of the list (e.g. "BuildBox/Configuration4") which resulted in different failures depending on the environment.
In a direct-on-iron ansible-container virtualenv environment created after the problem date, an "ansible-container build" call completes fine.
Depending on the docker host I run an ansible-container docker image on, I get an error like:
2018-06-07T18:00:35.723801 Processing defaults section... [container.config] caller_file=/_ansible/container/config.py caller_func=_process_defaults caller_line=325
2018-06-07T18:00:35.726157 Processing section... [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=volumes
2018-06-07T18:00:35.728781 Processing section... [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=registries
2018-06-07T18:00:35.731282 Processing section... [container.config] caller_file=/_ansible/container/config.py caller_func=_process_top_level_sections caller_line=334 section=secrets
2018-06-07T18:00:35.733772 Processing service... [container.config] caller_file=/_ansible/container/config.py caller_func=_process_services caller_line=340 service=u'base' service_data={u'command': [u'/usr/sbin/sshd', u'-D'], u'working_dir': u'/tmp', u'from': u'centos:7', u'ports': [u'22'], u'roles': [u'BuildBox/Base', u'BuildBox/Configuration1', u'BuildBox/Configuration2', u'BuildBox/Configuration3', u'BuildBox/Configuration4']}
Traceback (most recent call last):
File "/usr/bin/conductor", line 11, in <module>
load_entry_point('ansible-container', 'console_scripts', 'conductor')()
File "/_ansible/container/__init__.py", line 19, in __wrapped__
return fn(*args, **kwargs)
File "/_ansible/container/cli.py", line 389, in conductor_commandline
conductor_config = AnsibleContainerConductorConfig(list_to_ordereddict(containers_config))
File "/_ansible/container/__init__.py", line 19, in __wrapped__
return fn(*args, **kwargs)
File "/_ansible/container/config.py", line 297, in __init__
self._process_services()
File "/_ansible/container/config.py", line 357, in _process_services
role_metadata = get_metadata_from_role(role_name)
File "/_ansible/container/__init__.py", line 19, in __wrapped__
return fn(*args, **kwargs)
File "/_ansible/container/utils/__init__.py", line 275, in get_metadata_from_role
return get_content_from_role(role_name, os.path.join('meta', 'container.yml'))
File "/_ansible/container/__init__.py", line 19, in __wrapped__
return fn(*args, **kwargs)
File "/_ansible/container/utils/__init__.py", line 264, in get_content_from_role
role_path = resolve_role_to_path(role_name)
File "/_ansible/container/__init__.py", line 19, in __wrapped__
return fn(*args, **kwargs)
File "/_ansible/container/utils/__init__.py", line 210, in resolve_role_to_path
loader=loader)
File "/usr/lib/python2.7/site-packages/ansible/playbook/role/include.py", line 59, in load
return ri.load_data(data, variable_manager=variable_manager, loader=loader)
File "/usr/lib/python2.7/site-packages/ansible/playbook/base.py", line 244, in load_data
ds = self.preprocess_data(ds)
File "/usr/lib/python2.7/site-packages/ansible/playbook/role/definition.py", line 94, in preprocess_data
(role_name, role_path) = self._load_role_path(role_name)
File "/usr/lib/python2.7/site-packages/ansible/playbook/role/definition.py", line 187, in _load_role_path
raise AnsibleError("the role '%s' was not found in %s" % (role_name, ":".join(role_search_paths)), obj=self._ds)
ansible.errors.AnsibleError: the role '<NOTFOUNDROLE>' was not found in ./roles:<AC_ROLES_PATH>:/src/roles:/etc/ansible/roles:.
The <AC_ROLES_PATH> is the path provided in the ansible-container --roles-path option.
The missing <NOTFOUNDROLE> role is, at times:
- when using docker container running on host for the first time post 2018-06-05:
-
- the first role in the container.yml listing ("BuildBox/Base")
-
- removing that role simply results in failing to find the new first role
- when using docker container running on host working successfully previous to 2018-06-05:
-
- the last role in the container.yml listing ("BuildBox/Configuration4")
-
- if I remove the last role, making the list match what was working previous to 2018-06-05, the build completes fine
In all cases I can confirm all roles are present on the local / in-container filesystem before the ansible-container call.
The fact that on the working-before-2018-06-05 docker host, I can delete the recently-added last role and build successfully suggests that some caching is happening and maybe some intermediary tool changed (c.f. #673) but I am unable to determine what and where.
Failures not affected by presence/absense of --debug and/or --use-local-python
STEPS TO REPRODUCE
Create an on-iron virtualenv and set up environment as shown above
Create a Dockerfile with ansible-container environment as shown above
Set up the container.yml and various roles as described above
Run:
ansible-container build --services base --roles-path <wherever you put the roles>
EXPECTED RESULTS
working build, direct on-iron
ACTUAL RESULTS
debug output above, for ansible-container run in docker container on host, varies depending on host