Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to build beluga vs nav2 benchmark image for release target #114

Open
marcoshuck opened this issue Oct 26, 2024 · 10 comments · May be fixed by #115
Open

Failed to build beluga vs nav2 benchmark image for release target #114

marcoshuck opened this issue Oct 26, 2024 · 10 comments · May be fixed by #115
Labels
bug Something isn't working

Comments

@marcoshuck
Copy link
Member

Bug description

I'm unable to build the benchmark image, the error doesn't seem to be very informative, but I'm rather doing basic DevOps work than trying to understand the underlying issue and I thought it was worth reporting it.

Platform (please complete the following information):

  • OS: OpenSUSE Tumbleweed
  • Docker version: Docker version 26.1.5-ce, build 411e817ddf71
  • lambkin version: main branch

How to reproduce

List steps to reproduce the issue:

  1. ./tools/setup.sh
  2. ./tools/earthly ./src/benchmarks/beluga_vs_nav2+release

Expected behavior
An image in the docker image list

Actual behavior
Error log:

./s/b/beluga_vs_nav2+build | distro=jammy rosdistro=
./s/b/beluga_vs_nav2+build | --> COPY . src/beluga_vs_nav2
./s/b/beluga_vs_nav2+build | distro=jammy rosdistro=
./s/b/beluga_vs_nav2+build | --> RUN . /etc/profile && apt update && rosdep update && rosdep install -y -i --from-paths src -t build -t buildtool -t test --skip-keys 'lambkin-shepherd lambkin-clerk' && apt clean && rm -rf /var/lib/apt/lists/*
./s/e/t/latex+embed-ubuntu-release |    This may take some time... done.
./s/b/beluga_vs_nav2+build | Logged into development environment for external/ros2 on Ubuntu: jammy
./s/e/t/latex+embed-ubuntu-release | Processing triggers for libgdk-pixbuf-2.0-0:amd64 (2.42.8+dfsg-1ubuntu0.3) ...

./s/b/beluga_vs_nav2+build | WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

./s/b/beluga_vs_nav2+build | Reading package lists...

./s/b/beluga_vs_nav2+build | E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)

./s/e/t/latex+embed-ubuntu-release | WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

./s/b/beluga_vs_nav2+build | ERROR src/benchmarks/beluga_vs_nav2/Earthfile:56:4
./s/b/beluga_vs_nav2+build |       The command
./s/b/beluga_vs_nav2+build |           RUN . /etc/profile && apt update && rosdep update && rosdep install -y -i --from-paths src -t build -t buildtool -t test --skip-keys 'lambkin-shepherd lambkin-clerk' && apt clean && rm -rf /var/lib/apt/lists/*
./s/b/beluga_vs_nav2+build |       did not complete successfully. Exit code 100

================================== ❌ FAILURE ===================================

./s/b/beluga_vs_nav2+build *failed* | Repeating the failure error...
./s/b/beluga_vs_nav2+build *failed* | distro=jammy rosdistro=
./s/b/beluga_vs_nav2+build *failed* | --> RUN . /etc/profile && apt update && rosdep update && rosdep install -y -i --from-paths src -t build -t buildtool -t test --skip-keys 'lambkin-shepherd lambkin-clerk' && apt clean && rm -rf /var/lib/apt/lists/*
./s/b/beluga_vs_nav2+build *failed* | Logged into development environment for external/ros2 on Ubuntu: jammy

./s/b/beluga_vs_nav2+build *failed* | WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

./s/b/beluga_vs_nav2+build *failed* | Reading package lists...
./s/b/beluga_vs_nav2+build *failed* | E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
./s/b/beluga_vs_nav2+build *failed* | ERROR src/benchmarks/beluga_vs_nav2/Earthfile:56:4
./s/b/beluga_vs_nav2+build *failed* |       The command
./s/b/beluga_vs_nav2+build *failed* |           RUN . /etc/profile && apt update && rosdep update && rosdep install -y -i --from-paths src -t build -t buildtool -t test --skip-keys 'lambkin-shepherd lambkin-clerk' && apt clean && rm -rf /var/lib/apt/lists/*
./s/b/beluga_vs_nav2+build *failed* |       did not complete successfully. Exit code 100
./s/e/p/timemory+ubuntu-build | WARN src/external/profiling/timemory/Earthfile:23:4: The command
./s/e/p/timemory+ubuntu-build |           RUN SPACK_SKIP_MODULES= . /opt/spack/share/spack/setup-env.sh && spack install -y --fail-fast timemory@develop+tools && spack gc -y
./s/e/p/timemory+ubuntu-build | failed: context canceled
./s/e/t/latex+embed-ubuntu-release | WARN src/external/typesetting/latex/Earthfile:22:4: The command
./s/e/t/latex+embed-ubuntu-release |           RUN apt update && apt install -y latexmk tex-gyre texlive-latex-extra texlive-latex-recommended texlive-fonts-recommended && apt clean && rm -rf /var/lib/apt/lists/*
./s/e/t/latex+embed-ubuntu-release | failed: context canceled

Help: To debug your build, you can use the --interactive (-i) flag to drop into a shell of the failing RUN step: "./tools/earthly -i ./src/benchmarks/beluga_vs_nav2+release"

Additional context

None

@marcoshuck marcoshuck added the bug Something isn't working label Oct 26, 2024
@marcoshuck
Copy link
Member Author

The culprit seems to be:

    RUN . /etc/profile && apt update && rosdep update && \
        rosdep install -y -i --from-paths src -t build -t buildtool -t test \
            --skip-keys 'lambkin-shepherd lambkin-clerk' && \
        apt clean && rm -rf /var/lib/apt/lists/*

@marcoshuck
Copy link
Member Author

It seems that a rosdep fix-permissions in sudo mode fixed the issue.

marcoshuck added a commit that referenced this issue Oct 26, 2024
Running rosdep update and rosdep install returns a permission denied error as described in #114. This prevented the release target to be executed.

Adding a `rosdep fix-permissions` instruction to be run in sudo mode fixes the issue.
@marcoshuck marcoshuck linked a pull request Oct 26, 2024 that will close this issue
6 tasks
marcoshuck added a commit that referenced this issue Oct 26, 2024
Running rosdep update and rosdep install returns a permission denied error as described in #114. This prevented the release target to be executed.

Adding a `rosdep fix-permissions` instruction to be run in sudo mode fixes the issue.

Signed-off-by: Marcos Huck <[email protected]>
@marcoshuck
Copy link
Member Author

Just to report that after the fix in #115, I was able to build the image.

@glpuga
Copy link
Collaborator

glpuga commented Oct 28, 2024

I can replicate the failure, but I wonder why the fix in #115 is not necessary for #93 to build just fine.

@marcoshuck
Copy link
Member Author

I did a quick diff check and seems to be the same except for the src folder. When was the last time you ran the build target?

@glpuga
Copy link
Collaborator

glpuga commented Oct 28, 2024

I did a quick diff check and seems to be the same except for the src folder. When was the last time you ran the build target?

Hmm... a full build from-scratch, probably about a month back.

@marcoshuck
Copy link
Member Author

It might be worth another try, maybe? 🧐

@glpuga
Copy link
Collaborator

glpuga commented Oct 29, 2024

The difference is because I never used

./tools/earthly ./src/benchmarks/beluga_vs_nav2+release

The README instructions for both beluga_vs_nav2 and beluga_vs_nav2_multi_dataset are to use

./tools/earthly ./src/benchmarks/beluga_vs_nav2+local-devel

instead. I can't tell if the release target would be able to generate the report.

Using release in beluga_vs_nav2_multi_dataset has the same issue you found in beluga_vs_nav2.

@marcoshuck
Copy link
Member Author

I can't tell if the release target would be able to generate the report.

This is certainly important, I think we should be able to generate reports running docker run with a release target image, otherwise automation will be hard to implement. What do we need for this to happen?

@glpuga
Copy link
Collaborator

glpuga commented Oct 30, 2024

This is certainly important, I think we should be able to generate reports running docker run with a release target image, otherwise automation will be hard to implement. What do we need for this to happen?

The main issue is tha the full process takes about four days (happy path), assuming nothing fails along the way.

We can run a shorter run for a limited dataset and cross fingers and assume that any problem are not related to the image (which is kind of a lie, since at least the fix to #94 is applied on the docker image, so we'll have to make sure that fix is in all images).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants