-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial support for EROFS #9308
Conversation
9ff903d
to
0de4fdb
Compare
cb3d503
to
e5863b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really appreciate you working on upstreaming EROFS work. Thanks!
Sorry for the slow-ish review. It may come in batches, as I digest this large change.
Thanks a lot for taking time to review this PR. Your review comments are very helpful and constructive. I do appreciate it! I’m very glad that you like this work. PS. This is not a oneshot upstream work, we will send more follow up PRs and would like to keep improving it and help maintain it. |
9c1a0b0
to
2807b40
Compare
2807b40
to
c5ce6ca
Compare
Personally I'm also excited about EROFS image use cases on gvisor since we will release more new features in erofs-utils to generate EROFS images directly from OCI layers without unpacking tgz to host first. Hopefully helpful to gvisor too. If it could be upstream, I think Anygroup guys will help maintain/improve them. |
c5ce6ca
to
aceac59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some high level design comments. I think we should address that before I proceed with the remaining review. I think it will really help with performance and compatibility. Let me know what you think.
You can also see how tmpfs is implemented: https://cs.opensource.google/gvisor/gvisor/+/master:pkg/sentry/fsimpl/tmpfs/. It also has the dentry
and inode
structs in the filesystem impl. It is a good example to look at.
Make sense. There are some problems on the design choice for inode/dentry, and that should be fixed first. The two independent caching implementations in PS. This EROFS initial implementation was a quick implementation after I posted the comments in the RFC issue. I should take (and also need) a bit more time to polish the design and code (that's why you may have already noticed that I kept pushing refactors to this PR recently..). I was a bit too eager to send it out for review this time. I do appreciate your review comments. They are very helpful and constructive! Let me take a bit more time to prepare the next version. Thanks again! |
No issues, happy to review and get this checked in. We appreciate the contributions from Ant Group. Filesystem implementations are large changes, it is common (and necessary) to iterate on the implementation and get it right. These can contain some hot paths and taking the time to get this right pays off in the long run. |
aceac59
to
2fe4018
Compare
@ayushr2 I have pushed a new version to this PR. Please have a look when you have time. Thanks! Below is a brief summary of this new version:
|
2fe4018
to
f2e4363
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! Looking good! Just some minor comments. Hopefully we can merge this soon.
2 high level questions:
- How do we plan to do e2e testing for this? Maybe we can do this in a followup PR, but this codebase has no test coverage as of now. Maybe add some container tests (container/container_test.go) where we create a EROFS image by hand and run an application using that. Or even some bash tests would work.
- I see that you have added a new
mount
subcommand inrunsc debug
command. Have you considered using annotations instead to tell runsc where the EROFS image lives for a given mount? Maybe that will be a cleaner interface. cc @fvoznika
Thanks for the review!
I will add the support for using EROFS as rootfs in the next PR. I plan to add an e2e test which will use a EROFS image as the rootfs. I will see if it's possible to add some container tests or something similar first in this PR.
The new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add the support for using EROFS as rootfs in the next PR. I plan to add an e2e test which will use a EROFS image as the rootfs.
Sounds good. I am curious what the integration will look like. We want to get the host FD of the image to the fsimpl. There are a few ways of doing that:
- Attach the image file to the container as a bind mount. Bind mounts are handled a gofer mounts. Then when the application makes a
mount(2)
syscall withfstype=erofs
andsource = /container/erofs/image/path
, we can query the gofer mount at/container/erofs/image/path
to provide a host FD for this file and use that to mount erofs. - You can open the image file externally and use
--pass-fd
flag inrunsc run
orrunsc exec
to map that FD to an application FD. (See Add --pass-fd flags to runsc run and exec #8634 to learn more about --pass-fd.) Then use the application FD to make themount(2)
syscall while passing theifd
mount option. Then in erofs, use the FD table to get the backing host FD and use that to setup erofs mount.
(1) is a more general solution. With that, will you still need the debug mount
command?
Technically, both of (1) and (2) would work. But I think we may not want to allow applications to mount EROFS images via In our internal use case, we mount the images via |
f2e4363
to
0ca3032
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the applications inside container are untrusted, and if they can mount EROFS images via mount(2), it may increase the attack surface, e.g. potentially applications can try modifying the EROFS images in use or constructing and mounting malicious EROFS images.
That makes sense. Also, nice test.
LGTM! Thanks for your contribution. Just left a few nits, after which I will work to merge this. Note that it may take some time as our presubmits are broken right now, will push when its unblocked. Also our internal linters may flag some additional issues, which I will fix on your behalf and merge.
Cool! Thanks again for taking time to review this. I do appreciate it! |
8823d93
to
c1b468e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing!
This is cool, but it's not very useful on its own until further integration; specifically, the ability to mount an EROFS image as root filesystem. I assume this is the subject of future PRs, as per the initial comment of this PR. Looking forward to it.
Exactly. This PR focuses on the initial support for EROFS. And the integration and optimizations will be done in followup PRs. In our internal use case, when we don't need to bind mount host directories into containers, runsc won't need to launch gofer processes. It is helpful for speeding up the startup time and potentially can be helpful for reducing the overall seccomp list size (as many filesystem related host syscalls won't be needed during runtime anymore, runsc will access the image files via the mappings mapped by |
c1b468e
to
0b98338
Compare
I think the build is failing:
Could you pull from master and rebase and fix? I think we recently submitted 9d5198a which added a new method to FilesystemImpl. |
Sure, will do. |
0b98338
to
f4627c0
Compare
This patch adds initial support for EROFS [1]. Below is a brief summary of the supported features. Both inode formats are supported: - compact format (32 bytes); - extended format (64 bytes); Below data layouts are supported: - flat file data without data inline (no extent); - flat file data with tail packing data inline (no extent); Below file types are supported: - directory; - regular file; - symlink; Special files (e.g. fifo) can be listed, but cannot be accessed. With this patch, sentry will be able to mount the EROFS image created with below command and access the files on it. mkfs.erofs -E noinline_data <IMAGE FILE> <SOURCE DIRECTORY> [1] https://docs.kernel.org/filesystems/erofs.html Updates google#8956 Signed-off-by: Tiwei Bie <[email protected]>
A new option, --mount fstype:source:destination, is added to the debug command which allows us to mount filesystems for debug purposes. Currently only EROFS is supported. Signed-off-by: Tiwei Bie <[email protected]>
f4627c0
to
7864ce2
Compare
7864ce2
to
c7fa1df
Compare
This patch adds an EROFS test which checks that the checksums we get from the target directory in container are identical with the ones got from the source directory that we used to create the EROFS images. erofs-utils is also added to the default image and will be used to build the EROFS images during the test. Signed-off-by: Tiwei Bie <[email protected]>
c7fa1df
to
2f01e3f
Compare
\o/ |
This PR adds initial support for EROFS [1]. Below is a brief summary of the supported features.
Both inode formats are supported:
Below data layouts are supported:
Below file types are supported:
Special files (e.g. fifo) can be listed, but cannot be accessed.
With this PR, sentry will be able to mount the EROFS image created with below command and access the files on it.
Other features, e.g. checkpoint/restore, mounting EROFS as rootfs will be added in the follow up RPs.
Updates #8956
[1] https://docs.kernel.org/filesystems/erofs.html