Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for EROFS #9308

Merged
merged 3 commits into from
Oct 6, 2023

Conversation

btw616
Copy link
Contributor

@btw616 btw616 commented Aug 23, 2023

This PR adds initial support for EROFS [1]. Below is a brief summary of the supported features.

Both inode formats are supported:

  • compact format (32 bytes);
  • extended format (64 bytes);

Below data layouts are supported:

  • flat file data without data inline (no extent);
  • flat file data with tail packing data inline (no extent);

Below file types are supported:

  • directory;
  • regular file;
  • symlink;

Special files (e.g. fifo) can be listed, but cannot be accessed.

With this PR, sentry will be able to mount the EROFS image created with below command and access the files on it.

mkfs.erofs -E noinline_data <IMAGE FILE> <SOURCE DIRECTORY>

Other features, e.g. checkpoint/restore, mounting EROFS as rootfs will be added in the follow up RPs.

Updates #8956

[1] https://docs.kernel.org/filesystems/erofs.html

@ayushr2 ayushr2 self-assigned this Aug 23, 2023
pkg/erofs/erofs.go Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
@btw616 btw616 force-pushed the erofs-initial-support branch 3 times, most recently from cb3d503 to e5863b2 Compare September 8, 2023 12:45
@btw616 btw616 requested a review from ayushr2 September 12, 2023 01:47
Copy link
Collaborator

@ayushr2 ayushr2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really appreciate you working on upstreaming EROFS work. Thanks!

Sorry for the slow-ish review. It may come in batches, as I digest this large change.

pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/regular_file.go Outdated Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/regular_file.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/regular_file.go Outdated Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/regular_file.go Outdated Show resolved Hide resolved
@btw616
Copy link
Contributor Author

btw616 commented Sep 12, 2023

We really appreciate you working on upstreaming EROFS work. Thanks!

Sorry for the slow-ish review. It may come in batches, as I digest this large change.

Thanks a lot for taking time to review this PR. Your review comments are very helpful and constructive. I do appreciate it!

I’m very glad that you like this work. PS. This is not a oneshot upstream work, we will send more follow up PRs and would like to keep improving it and help maintain it.

@btw616 btw616 force-pushed the erofs-initial-support branch 3 times, most recently from 9c1a0b0 to 2807b40 Compare September 12, 2023 15:58
@hsiangkao
Copy link

We really appreciate you working on upstreaming EROFS work. Thanks!
Sorry for the slow-ish review. It may come in batches, as I digest this large change.

Thanks a lot for taking time to review this PR. Your review comments are very helpful and constructive. I do appreciate it!

I’m very glad that you like this work. PS. This is not a oneshot upstream work, we will send more follow up PRs and would like to keep improving it and help maintain it.

Personally I'm also excited about EROFS image use cases on gvisor since we will release more new features in erofs-utils to generate EROFS images directly from OCI layers without unpacking tgz to host first. Hopefully helpful to gvisor too.

If it could be upstream, I think Anygroup guys will help maintain/improve them.

Copy link
Collaborator

@ayushr2 ayushr2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some high level design comments. I think we should address that before I proceed with the remaining review. I think it will really help with performance and compatibility. Let me know what you think.

You can also see how tmpfs is implemented: https://cs.opensource.google/gvisor/gvisor/+/master:pkg/sentry/fsimpl/tmpfs/. It also has the dentry and inode structs in the filesystem impl. It is a good example to look at.

pkg/sentry/fsimpl/erofs/directory.go Outdated Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/directory.go Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/erofs.go Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/erofs.go Outdated Show resolved Hide resolved
@btw616
Copy link
Contributor Author

btw616 commented Sep 15, 2023

Left some high level design comments. I think we should address that before I proceed with the remaining review. I think it will really help with performance and compatibility. Let me know what you think.

Make sense. There are some problems on the design choice for inode/dentry, and that should be fixed first. The two independent caching implementations in pkg/erofs and pkg/sentry/fsimpl/erofs also looks strange and bothers me as well. I think it's a really good point that the inode in pkg/erofs should be a simple struct that can be passed by value. I will take that approach and rewrite this whole part. Thanks!

PS. This EROFS initial implementation was a quick implementation after I posted the comments in the RFC issue. I should take (and also need) a bit more time to polish the design and code (that's why you may have already noticed that I kept pushing refactors to this PR recently..). I was a bit too eager to send it out for review this time. I do appreciate your review comments. They are very helpful and constructive! Let me take a bit more time to prepare the next version. Thanks again!

@ayushr2
Copy link
Collaborator

ayushr2 commented Sep 15, 2023

This EROFS initial implementation was a quick implementation after I posted the comments in the RFC issue.

No issues, happy to review and get this checked in. We appreciate the contributions from Ant Group. Filesystem implementations are large changes, it is common (and necessary) to iterate on the implementation and get it right. These can contain some hot paths and taking the time to get this right pays off in the long run.

@btw616
Copy link
Contributor Author

btw616 commented Sep 21, 2023

@ayushr2 I have pushed a new version to this PR. Please have a look when you have time. Thanks!

Below is a brief summary of this new version:

  • inode is introduced in pkg/sentry/fsimpl/erofs. Now dentry is mainly only used to construct the dentry tree and implement the caching policy.

  • pkg/erofs will not cache any objects internally anymore, and it relies on host kernel to cache blocks/pages of the image transparently;

  • Inode in pkg/erofs is a simple struct that can be passed by value now. Ideally we should avoid escaping objects to heap when constructing the objects from image, but it's not true currently and may require some unsafe code. This is left for future optimizations.

  • Supports for reading regular files with inline data has been added. But mapping regular files into guest userspace still requires the regular file has no inline data (i.e. file data of regular files in the image should be page size aligned).

  • Misc fixes and cleanups.

Copy link
Collaborator

@ayushr2 ayushr2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Looking good! Just some minor comments. Hopefully we can merge this soon.

2 high level questions:

  1. How do we plan to do e2e testing for this? Maybe we can do this in a followup PR, but this codebase has no test coverage as of now. Maybe add some container tests (container/container_test.go) where we create a EROFS image by hand and run an application using that. Or even some bash tests would work.
  2. I see that you have added a new mount subcommand in runsc debug command. Have you considered using annotations instead to tell runsc where the EROFS image lives for a given mount? Maybe that will be a cleaner interface. cc @fvoznika

pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Show resolved Hide resolved
pkg/erofs/erofs.go Show resolved Hide resolved
pkg/erofs/erofs.go Show resolved Hide resolved
pkg/erofs/erofs.go Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/erofs/erofs.go Outdated Show resolved Hide resolved
pkg/sentry/fsimpl/erofs/erofs.go Outdated Show resolved Hide resolved
@btw616
Copy link
Contributor Author

btw616 commented Sep 25, 2023

Thanks for the updates! Looking good! Just some minor comments. Hopefully we can merge this soon.

Thanks for the review!

  1. How do we plan to do e2e testing for this? Maybe we can do this in a followup PR, but this codebase has no test coverage as of now. Maybe add some container tests (container/container_test.go) where we create a EROFS image by hand and run an application using that. Or even some bash tests would work.

I will add the support for using EROFS as rootfs in the next PR. I plan to add an e2e test which will use a EROFS image as the rootfs. I will see if it's possible to add some container tests or something similar first in this PR.

  1. I see that you have added a new mount subcommand in runsc debug command. Have you considered using annotations instead to tell runsc where the EROFS image lives for a given mount? Maybe that will be a cleaner interface. cc @fvoznika

The new mount subcommand in runsc debug command is just to allow us to mount a EROFS image into sandbox manually for debug purpose (which is very convenient IMO). Using annotations sounds good. I will take a deeper look. Thanks for the suggestions!

Copy link
Collaborator

@ayushr2 ayushr2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add the support for using EROFS as rootfs in the next PR. I plan to add an e2e test which will use a EROFS image as the rootfs.

Sounds good. I am curious what the integration will look like. We want to get the host FD of the image to the fsimpl. There are a few ways of doing that:

  1. Attach the image file to the container as a bind mount. Bind mounts are handled a gofer mounts. Then when the application makes a mount(2) syscall with fstype=erofs and source = /container/erofs/image/path, we can query the gofer mount at /container/erofs/image/path to provide a host FD for this file and use that to mount erofs.
  2. You can open the image file externally and use --pass-fd flag in runsc run or runsc exec to map that FD to an application FD. (See Add --pass-fd flags to runsc run and exec #8634 to learn more about --pass-fd.) Then use the application FD to make the mount(2) syscall while passing the ifd mount option. Then in erofs, use the FD table to get the backing host FD and use that to setup erofs mount.

(1) is a more general solution. With that, will you still need the debug mount command?

pkg/sentry/fsimpl/erofs/directory.go Show resolved Hide resolved
@btw616
Copy link
Contributor Author

btw616 commented Sep 26, 2023

I will add the support for using EROFS as rootfs in the next PR. I plan to add an e2e test which will use a EROFS image as the rootfs.

Sounds good. I am curious what the integration will look like. We want to get the host FD of the image to the fsimpl. There are a few ways of doing that:

  1. Attach the image file to the container as a bind mount. Bind mounts are handled a gofer mounts. Then when the application makes a mount(2) syscall with fstype=erofs and source = /container/erofs/image/path, we can query the gofer mount at /container/erofs/image/path to provide a host FD for this file and use that to mount erofs.
  2. You can open the image file externally and use --pass-fd flag in runsc run or runsc exec to map that FD to an application FD. (See Add --pass-fd flags to runsc run and exec #8634 to learn more about --pass-fd.) Then use the application FD to make the mount(2) syscall while passing the ifd mount option. Then in erofs, use the FD table to get the backing host FD and use that to setup erofs mount.

(1) is a more general solution. With that, will you still need the debug mount command?

Technically, both of (1) and (2) would work. But I think we may not want to allow applications to mount EROFS images via mount(2) in container unless we really need that. Because the applications inside container are untrusted, and if they can mount EROFS images via mount(2), it may increase the attack surface, e.g. potentially applications can try modifying the EROFS images in use or constructing and mounting malicious EROFS images. So I think it may be better to enforce that EROFS mount operations can only be initiated from the host side for now.

In our internal use case, we mount the images via config.json and RPCs directly. That implementation may not strictly follow the OCI standard as we want to build something highly optimized for FaaS. But I will try to create an OCI compliant implementation for the PR which does the integration work.

Copy link
Collaborator

@ayushr2 ayushr2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the applications inside container are untrusted, and if they can mount EROFS images via mount(2), it may increase the attack surface, e.g. potentially applications can try modifying the EROFS images in use or constructing and mounting malicious EROFS images.

That makes sense. Also, nice test.

LGTM! Thanks for your contribution. Just left a few nits, after which I will work to merge this. Note that it may take some time as our presubmits are broken right now, will push when its unblocked. Also our internal linters may flag some additional issues, which I will fix on your behalf and merge.

runsc/boot/controller.go Outdated Show resolved Hide resolved
runsc/boot/controller.go Outdated Show resolved Hide resolved
runsc/sandbox/sandbox.go Outdated Show resolved Hide resolved
runsc/container/container_test.go Outdated Show resolved Hide resolved
runsc/container/container_test.go Outdated Show resolved Hide resolved
runsc/container/container_test.go Outdated Show resolved Hide resolved
runsc/container/container_test.go Outdated Show resolved Hide resolved
@btw616
Copy link
Contributor Author

btw616 commented Oct 3, 2023

LGTM! Thanks for your contribution. Just left a few nits, after which I will work to merge this. Note that it may take some time as our presubmits are broken right now, will push when its unblocked. Also our internal linters may flag some additional issues, which I will fix on your behalf and merge.

Cool! Thanks again for taking time to review this. I do appreciate it!

Copy link
Contributor

@EtiennePerot EtiennePerot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing!

This is cool, but it's not very useful on its own until further integration; specifically, the ability to mount an EROFS image as root filesystem. I assume this is the subject of future PRs, as per the initial comment of this PR. Looking forward to it.

runsc/sandbox/sandbox.go Show resolved Hide resolved
@btw616
Copy link
Contributor Author

btw616 commented Oct 4, 2023

This is cool, but it's not very useful on its own until further integration; specifically, the ability to mount an EROFS image as root filesystem. I assume this is the subject of future PRs, as per the initial comment of this PR. Looking forward to it.

Exactly. This PR focuses on the initial support for EROFS. And the integration and optimizations will be done in followup PRs. In our internal use case, when we don't need to bind mount host directories into containers, runsc won't need to launch gofer processes. It is helpful for speeding up the startup time and potentially can be helpful for reducing the overall seccomp list size (as many filesystem related host syscalls won't be needed during runtime anymore, runsc will access the image files via the mappings mapped by mmap(2) directly). I wish gVisor could support this eventually as well.

@ayushr2
Copy link
Collaborator

ayushr2 commented Oct 4, 2023

I think the build is failing:

Error: pkg/sentry/fsimpl/erofs/erofs.go:111:33: cannot use fs (variable of type *filesystem) as vfs.FilesystemImpl value in argument to fs.vfsfs.Init: *filesystem does not implement vfs.FilesystemImpl (missing method IsDescendant)
Error: pkg/sentry/fsimpl/erofs/erofs.go:468:9: impossible type assertion: fd.vfsfd.Mount().Filesystem().Impl().(*filesystem)
	*filesystem does not implement vfs.FilesystemImpl (missing method IsDescendant)
Error: bazel-out/k8-fastbuild/bin/pkg/sentry/fsimpl/erofs/fstree.go:19:23: d2.parent.Load undefined (type *dentry has no field or method Load)
Error: bazel-out/k8-fastbuild/bin/pkg/sentry/fsimpl/erofs/fstree.go:35:16: d.parent.Load undefined (type *dentry has no field or method Load)
Error: bazel-out/k8-fastbuild/bin/pkg/sentry/fsimpl/erofs/fstree.go:42:24: d.parent.Load undefined (type *dentry has no field or method Load)
Error: bazel-out/k8-fastbuild/bin/pkg/sentry/fsimpl/erofs/fstree.go:57:22: d.parent.Load undefined (type *dentry has no field or method Load)

Could you pull from master and rebase and fix? I think we recently submitted 9d5198a which added a new method to FilesystemImpl.

@btw616
Copy link
Contributor Author

btw616 commented Oct 4, 2023

I think the build is failing:

Error: pkg/sentry/fsimpl/erofs/erofs.go:111:33: cannot use fs (variable of type *filesystem) as vfs.FilesystemImpl value in argument to fs.vfsfs.Init: *filesystem does not implement vfs.FilesystemImpl (missing method IsDescendant)
Error: pkg/sentry/fsimpl/erofs/erofs.go:468:9: impossible type assertion: fd.vfsfd.Mount().Filesystem().Impl().(*filesystem)
	*filesystem does not implement vfs.FilesystemImpl (missing method IsDescendant)
Error: bazel-out/k8-fastbuild/bin/pkg/sentry/fsimpl/erofs/fstree.go:19:23: d2.parent.Load undefined (type *dentry has no field or method Load)
Error: bazel-out/k8-fastbuild/bin/pkg/sentry/fsimpl/erofs/fstree.go:35:16: d.parent.Load undefined (type *dentry has no field or method Load)
Error: bazel-out/k8-fastbuild/bin/pkg/sentry/fsimpl/erofs/fstree.go:42:24: d.parent.Load undefined (type *dentry has no field or method Load)
Error: bazel-out/k8-fastbuild/bin/pkg/sentry/fsimpl/erofs/fstree.go:57:22: d.parent.Load undefined (type *dentry has no field or method Load)

Could you pull from master and rebase and fix? I think we recently submitted 9d5198a which added a new method to FilesystemImpl.

Sure, will do.

This patch adds initial support for EROFS [1]. Below is a brief
summary of the supported features.

Both inode formats are supported:

- compact format (32 bytes);
- extended format (64 bytes);

Below data layouts are supported:

- flat file data without data inline (no extent);
- flat file data with tail packing data inline (no extent);

Below file types are supported:

- directory;
- regular file;
- symlink;

Special files (e.g. fifo) can be listed, but cannot be accessed.

With this patch, sentry will be able to mount the EROFS image
created with below command and access the files on it.

 mkfs.erofs -E noinline_data <IMAGE FILE> <SOURCE DIRECTORY>

[1] https://docs.kernel.org/filesystems/erofs.html

Updates google#8956

Signed-off-by: Tiwei Bie <[email protected]>
A new option, --mount fstype:source:destination, is added to the
debug command which allows us to mount filesystems for debug purposes.
Currently only EROFS is supported.

Signed-off-by: Tiwei Bie <[email protected]>
runsc/container/container_test.go Outdated Show resolved Hide resolved
runsc/container/container_test.go Outdated Show resolved Hide resolved
images/default/Dockerfile Show resolved Hide resolved
This patch adds an EROFS test which checks that the checksums we get
from the target directory in container are identical with the ones
got from the source directory that we used to create the EROFS images.

erofs-utils is also added to the default image and will be used to
build the EROFS images during the test.

Signed-off-by: Tiwei Bie <[email protected]>
@copybara-service copybara-service bot merged commit f2f99b5 into google:master Oct 6, 2023
5 checks passed
@EtiennePerot
Copy link
Contributor

\o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants