Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlaybd did not block when networking was not available for long time #214

Open
1 task
shuaichang opened this issue Jul 17, 2023 · 2 comments
Open
1 task
Labels
bug Something isn't working

Comments

@shuaichang
Copy link

What happened in your environment?

We found a potential overlaybd bug that it returned incorrect data during networking was down. This could lead to application failures, in our case is Java failed to load class

What did you expect to happen?

When networking is down, the class loading should be completely blocked until the network recovers. However, we currently see "Exception: java.lang.NoClassDefFoundError" and " error reading zip file" after retrying for 3+ minutes.

We suspect there's a bug in overlaybd that it returned some unexpected result but instead it should block until networking is recovered. given the following experiments we did:

  1. We did systemctl stop overlaybd-tcmu, after which jar command would actually hang forever until overlaybd-tcmu recover
  2. With a normal jar stored on a device-mapper block device, if we suspend the IO in the DM device, the jar command would hang forever until the IO suspension was removed

How can we reproduce it?

  • Step 1, build, convert and push a repro image using the following Dockerfile
FROM ubuntu:18.04
RUN apt-get update \
    # TODO: upgrade to JAVA 11 in the next sprint
    && apt-get install -y openjdk-8-jdk git vim \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN git clone https://github.com/macagua/example.java.helloworld.git && cd example.java.helloworld && javac HelloWorld/Main.java && jar cfme Main.jar Manifest.txt HelloWorld.Main HelloWorld/Main.class
RUN echo 'export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64' >> ~/.bashrc
  • Step 2: rpull and bash into the container
/opt/overlaybd/snapshotter/ctr -n k8s.io rpull -u $USERNAME:$PASSWORD $IMAGE_REF

ctr -n k8s.io run --snapshotter=overlaybd --rm -t $IMAGE_REF test-jar bash

# In side the shell, run `jar` command to load the binary
  • Step 3: shutdown the network, we did this by turning off the security group of the VM
  • Step 4: inside the bash shell, run
jar vft ./example.java.helloworld/Main.jar


# After several minutes, we see "error reading zip file" error
root@ip-10-0-0-134:/# jar vft ./example.java.helloworld/Main.jar 
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar: error reading zip file
/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar: error reading zip file
Exception in thread "main" 
Exception: java.lang.NoClassDefFoundError thrown from the UncaughtExceptionHandler in thread "main"

What is the version of your Accelerated Container Image?

  • overlaybd 0.6.10

What is your OS environment?

ubuntu

Are you willing to submit PRs to fix it?

  • Yes, I am willing to fix it.
@shuaichang shuaichang added the bug Something isn't working label Jul 17, 2023
@shuaichang
Copy link
Author

Also just to add some more info, per suggested by @liulanzheng offline, the following diff + overlaybd rebuild fixed the issue

image

@shuaichang
Copy link
Author

Verified that 0.6.12 fixed the issue, please feel free to close the issue, thank you very much @liulanzheng for making such a fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant