-
Notifications
You must be signed in to change notification settings - Fork 366
Implement in-memory worker file system #4029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Although this is marked draft, it is ready for an initial round of reviews. |
I'll stop reviewing at this point as there seems to be too much up in the air at this stage. Also, considering we've already implemented everything in C++ anyway and left very little for the imagination. I would love to see this interfaced as an API type for easy interfacing from different languages (Node,Python,Rust etc.) |
@danlapid and @anonrig .. updated based on review and handling additional todos. PTAL Specifically the following notable changes were made
|
@@ -38,11 +39,14 @@ class WorkerdApi final: public Worker::Api { | |||
kj::Own<JsgIsolateObserver> observer, | |||
api::MemoryCacheProvider& memoryCacheProvider, | |||
const PythonConfig& pythonConfig, | |||
kj::Maybe<kj::Own<jsg::modules::ModuleRegistry>> newModuleRegistry); | |||
kj::Maybe<kj::Own<jsg::modules::ModuleRegistry>> newModuleRegistry, | |||
kj::Maybe<kj::Own<VirtualFileSystem>> vfs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really a Maybe? Seems like we'll have it always so it should probably just be unwrapped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: if this is just for tests then I think we can just add a fixture that's a no-op VFS instead.
Our code should signify what is possible in prod rather than what is useful in tests (in my opinion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it's a maybe because nothing on the internal repo side sets this. Once the internal project is updated to provide the vfs instance this can be changed removing the kj::Maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this expected to be part of this PR before pulling out of draft?
If so, mind adding it as a TODO tag to the PR description?
@@ -572,6 +580,11 @@ class Worker::Api { | |||
virtual void setModuleFallbackCallback(kj::Function<ModuleFallbackCallback>&& callback) const { | |||
// By default does nothing. | |||
} | |||
|
|||
// Return the virtual file system for this worker, if any. | |||
virtual kj::Maybe<const VirtualFileSystem&> getVirtualFileSystem() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the former question regarding the VFS being a maybe is accepted this can be changed to not return a maybe either and just be a pure virtual function.
// that a jsg::Lock reference is passed in as proof that current() is called | ||
// from within a valid isolate lock so that the Worker::Api::current() | ||
// call below will work as expected. | ||
return Worker::Api::current().getVirtualFileSystem(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, this won't work in python 😢
Api::current
uses Worker::Isolate::Impl::Lock
which is not instantiated when we do python startups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's part of why I suggested we might have to go with the abstract base class that is passed into jsg
// Used to build a new read-only directory. All files and directories added | ||
// may or may not be writable. The directory will not be initially included | ||
// in a directory. To add it to a directory, use the add method. | ||
class Builder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the only reason we actually need a Builder is because Directory doesn't support adding files with path separators?
e.g. tryOpen("dir_a/dir_b/file", mode=Create) is not supported and will not recursively create directories as required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Builder is only used when constructing a read only directory. It's not used at all with directories that are already built.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, tryOpen
does create subdirectories as needed if those are not read-only. If the directory is read-only then tryOpen
will fail. The Builder
here is currently only used when constructing the read-only structure from a configuration bundle. Once the structure is built, Builder
is no longer used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we simply add a releaseAsReadOnly to a WritableDirectory then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but I'm not convinced there's much benefit in doing so as it really wouldn't reduce the complexity at all.
911f574
to
8e12283
Compare
Ok... updated the PR with a few more touch ups and largely just to demonstrate that it works as expected, I through in an initial rough implementation of the Web FileSystem API (would need more tests, namely web platform forms) .. https://fs.spec.whatwg.org/ More importantly, this includes the |
e3136da
to
e9e64e0
Compare
So this is a fun one. In preparation for the implementation of node:fs (and eventually also the Web file system API), we need to implement a virtual file system for the worker. There's quite a bit to unpack here so let's go through it a bit. First, let's talk about the file system itself. Every worker instance will have its own root directory (`/`). In this root directory we will have at least two special directories, the "bundle" root and the "temp" root. * The bundle root is where all of the modules that are included in the worker bundle will be accessible. Everything in this directory will be strictly read-only. The contents are populated by the worker configuration bundle. By default, the bundle root will be `/bundle` but this can be overridden. * The temp root is where we will allow temporary files to be created. Everything in this directory is read-write but will be transient. By default, the temp root will be `/tmp` but this can also be overridden. Let's imagine the following simple workerd configuration: ``` const helloWorld :Workerd.Worker = ( modules = [ (name = "worker", esModule = embed "worker.js"), (name = "foo", text = "Hello World!"), ], compatibilityDate = "2023-02-28", ); ``` Given this configuration, the worker fs will initially have the following structure: ``` / ├── bundle │ ├── worker │ └── foo └── tmp ``` We can then use the `jsg::Lock&` to access the file system at the C++ level. We reuse the existing `kj::Filesystem` API to provide the interface. ```cpp jsg::Lock& js = ... // Resolve the root of the file system KJ_IF_SOME(node, js.resolveVfsNode("file:///"_url)) { auto& dir = kj::downcast<const kj::ReadableDirectory>( node); // List the contents of the directory KJ_DBG(dir.listNames()); } KJ_IF_SOME(node, js.resolveVfsNode( "file:///bundle/worker"_url)) { auto& file = kj::downcast<const kj::ReadableFile>(node); KJ_DBG(file.readAllText()); } KJ_IF_SOME(node, js.resolveVfsNode("file:///tmp"_url)) { auto& dir = kj::downcast<const kj::Directory>(node); auto tmpFile = dir.createTemporary(); // tmpFile is an anonymous temporary file } ``` The temporary file directory is a bit special in that the contents are fully transient based on whether there is or is not an active IoContext. We use a special RAII `TempDirStoreScope` scope to manage the contents of the temporary directory. For example, ```cpp KJ_IF_SOME(node, js.resolveVfsNode("file:///tmp"_url)) { auto& dir = kj::downcast<const kj::Directory>(node); kj::Path path("a/b/c/foo.txt") { TempDirStoreScope temp_dir_scope; auto tmpFile = dir.openFile(path, kj::FileMode::CREATE); KJ_ASSERT(tmpFile.write(0, "Hello World!"_kjb) == 12); KJ_DBG(dir.exists(path)); // true! } // The temp dir scope is destructed and the file // is deleted KJ_DBG(dir.exists(path)); // false! } ``` However, if there is an active IoContext, the temporary file will instead be created within that IoContext's TempDirStoreScope, and will be deleted when the IoContext is destructed. This allows us to have a single virtual file system whose temporary directories are either deleted immediately as soon as the execution scope is exited, or are specific to the IoContext and are deleted when the IoContext is destructed. This mechanism allows us to have multiple IoContexts active at the same time while still having a single virtual file system whose contents correctly reflect the current IoContext. Temporary files can be created, copied, etc only within the temporary directory. All other directories are read-only. When there is no active IoContext, all temporary files will have a timestamp set to the Unix epoch. When there is an active IoContext, the temporary files will acquire the current timestamp from the IoContext ensuring that the file system's view of time is consistent within a request. The design here is intended to be extensible. We can add new root directories in the future with different semantics and implementations. For example, to support python workers we can introduce a new root directory that is backed, for instance, by a tar/zip file containing the python standard library, etc. What isn't implemented yet? Great question! The following todos are remaining: * Implementing memory accounting for the temporary file system. Currently the implementation is using the default in-memory file factory provided by kj. This is not ideal as it doesn't provide any mechanisms for memory accounting that integrates with the Isolate heap limits. As a next step, before this work is complete, we will need to implement a custom in-memory file factory that does integrate with the memory accounting system. The intention is that the total size of the temporary files will be limited to the Isolate heap limit. * Implmenting "file descriptors". Currently the file system will return `kj::none` for all file descriptors. As a next step, the file system will implement a custom file descriptor counter that will ensure that all files are assigned a unique monotonically increasing file descriptor. When the counter reaches a maximum threshold of opened files, the worker will be condemned. We can also just track the number of active `kj::Own<const kj::File>` objects and use that internally for accounting but the `node:fs` impl does a lot with integer fds so we will need them at some point for that. The implementation currently does not support symbolic links in any way. I do not anticipate implementing this in the future. The implementation is split across several files/components: `worker-fs.h/c++` - These provide the main interfaces for the worker file system. `bundle-fs.h/c++` - These provide the interfaces for interfacing the workerd configuration with the worker file system. A similar interface will need to be provided for the internal repo since the two use different configuration schemas. The integration via the `jsg::Lock&` is provided as the most convenient way to access the file system where it is needed.
Reworks the implementation away from kj::Filesystem.
Rework the implementation to move out of jsg, add symlinks, and generally improve the API based on review feedback.
The `/dev/null`, `/dev/zero`, `/dev/full` and `/dev/random` paths are generally quite useful and often used. Let's provide them in our virtual file system.
The generated output of Full Type Diffdiff -r types/generated-snapshot/experimental/index.d.ts bazel-bin/types/definitions/experimental/index.d.ts
259a260,265
> FileSystemHandle: typeof FileSystemHandle;
> FileSystemFileHandle: typeof FileSystemFileHandle;
> FileSystemDirectoryHandle: typeof FileSystemDirectoryHandle;
> FileSystemWritableFileStream: typeof FileSystemWritableFileStream;
> FileSystemSyncAccessHandle: typeof FileSystemSyncAccessHandle;
> StorageManager: typeof StorageManager;
470a477
> readonly storage: StorageManager;
1379a1387,1528
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemHandle)
> */
> declare abstract class FileSystemHandle {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemHandle/kind) */
> get kind(): string;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemHandle/name) */
> get name(): string;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemHandle/isSameEntry) */
> isSameEntry(other: FileSystemHandle): Promise<boolean>;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemFileHandle)
> */
> declare abstract class FileSystemFileHandle extends FileSystemHandle {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemFileHandle/getFile) */
> getFile(): Promise<File>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemFileHandle/createWritable) */
> createWritable(
> options?: FileSystemFileHandleFileSystemCreateWritableOptions,
> ): Promise<FileSystemWritableFileStream>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemFileHandle/createSyncAccessHandle) */
> createSyncAccessHandle(): Promise<FileSystemSyncAccessHandle>;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle)
> */
> declare abstract class FileSystemDirectoryHandle extends FileSystemHandle {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle/getFileHandle) */
> getFileHandle(
> name: string,
> options?: FileSystemDirectoryHandleFileSystemGetFileOptions,
> ): Promise<FileSystemFileHandle>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle/getDirectoryHandle) */
> getDirectoryHandle(
> name: string,
> options?: FileSystemDirectoryHandleFileSystemGetDirectoryOptions,
> ): Promise<FileSystemDirectoryHandle>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle/removeEntry) */
> removeEntry(
> name: string,
> options?: FileSystemDirectoryHandleFileSystemRemoveOptions,
> ): Promise<void>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle/resolve) */
> resolve(possibleDescendant: FileSystemHandle): Promise<string[]>;
> entries(): IterableIterator<FileSystemDirectoryHandleEntryType>;
> keys(): IterableIterator<string>;
> values(): IterableIterator<FileSystemHandle>;
> forEach(
> callback: (
> param0: string,
> param1: FileSystemHandle,
> param2: FileSystemDirectoryHandle,
> ) => void,
> thisArg?: any,
> ): void;
> [Symbol.iterator](): IterableIterator<FileSystemDirectoryHandleEntryType>;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemWritableFileStream)
> */
> declare abstract class FileSystemWritableFileStream extends WritableStream {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemWritableFileStream/write) */
> write(
> data:
> | Blob
> | (ArrayBuffer | ArrayBufferView)
> | string
> | FileSystemWritableFileStreamWriteParams,
> ): Promise<void>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemWritableFileStream/seek) */
> seek(position: number): Promise<void>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemWritableFileStream/truncate) */
> truncate(size: number): Promise<void>;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle)
> */
> declare abstract class FileSystemSyncAccessHandle {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/read) */
> read(
> buffer: ArrayBuffer | ArrayBufferView,
> options?: FileSystemSyncAccessHandleFileSystemReadWriteOptions,
> ): number;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/write) */
> write(
> buffer: ArrayBuffer | ArrayBufferView,
> options?: FileSystemSyncAccessHandleFileSystemReadWriteOptions,
> ): number;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/truncate) */
> truncate(newSize: number): void;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/getSize) */
> getSize(): number;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/flush) */
> flush(): void;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/close) */
> close(): void;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/StorageManager)
> */
> declare abstract class StorageManager {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/StorageManager/getDirectory) */
> getDirectory(): Promise<FileSystemDirectoryHandle>;
> }
> interface FileSystemFileHandleFileSystemCreateWritableOptions {
> keepExistingData: boolean;
> }
> interface FileSystemDirectoryHandleFileSystemGetFileOptions {
> create: boolean;
> }
> interface FileSystemDirectoryHandleFileSystemGetDirectoryOptions {
> create: boolean;
> }
> interface FileSystemDirectoryHandleFileSystemRemoveOptions {
> recursive: boolean;
> }
> interface FileSystemSyncAccessHandleFileSystemReadWriteOptions {
> at?: number;
> }
> interface FileSystemWritableFileStreamWriteParams {
> type: string;
> size?: number;
> position?: number;
> data?: Blob | (ArrayBuffer | ArrayBufferView) | string;
> }
> interface FileSystemDirectoryHandleEntryType {
> key: string;
> value: FileSystemHandle;
diff -r types/generated-snapshot/experimental/index.ts bazel-bin/types/definitions/experimental/index.ts
259a260,265
> FileSystemHandle: typeof FileSystemHandle;
> FileSystemFileHandle: typeof FileSystemFileHandle;
> FileSystemDirectoryHandle: typeof FileSystemDirectoryHandle;
> FileSystemWritableFileStream: typeof FileSystemWritableFileStream;
> FileSystemSyncAccessHandle: typeof FileSystemSyncAccessHandle;
> StorageManager: typeof StorageManager;
475a482
> readonly storage: StorageManager;
1384a1392,1533
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemHandle)
> */
> export declare abstract class FileSystemHandle {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemHandle/kind) */
> get kind(): string;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemHandle/name) */
> get name(): string;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemHandle/isSameEntry) */
> isSameEntry(other: FileSystemHandle): Promise<boolean>;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemFileHandle)
> */
> export declare abstract class FileSystemFileHandle extends FileSystemHandle {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemFileHandle/getFile) */
> getFile(): Promise<File>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemFileHandle/createWritable) */
> createWritable(
> options?: FileSystemFileHandleFileSystemCreateWritableOptions,
> ): Promise<FileSystemWritableFileStream>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemFileHandle/createSyncAccessHandle) */
> createSyncAccessHandle(): Promise<FileSystemSyncAccessHandle>;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle)
> */
> export declare abstract class FileSystemDirectoryHandle extends FileSystemHandle {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle/getFileHandle) */
> getFileHandle(
> name: string,
> options?: FileSystemDirectoryHandleFileSystemGetFileOptions,
> ): Promise<FileSystemFileHandle>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle/getDirectoryHandle) */
> getDirectoryHandle(
> name: string,
> options?: FileSystemDirectoryHandleFileSystemGetDirectoryOptions,
> ): Promise<FileSystemDirectoryHandle>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle/removeEntry) */
> removeEntry(
> name: string,
> options?: FileSystemDirectoryHandleFileSystemRemoveOptions,
> ): Promise<void>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemDirectoryHandle/resolve) */
> resolve(possibleDescendant: FileSystemHandle): Promise<string[]>;
> entries(): IterableIterator<FileSystemDirectoryHandleEntryType>;
> keys(): IterableIterator<string>;
> values(): IterableIterator<FileSystemHandle>;
> forEach(
> callback: (
> param0: string,
> param1: FileSystemHandle,
> param2: FileSystemDirectoryHandle,
> ) => void,
> thisArg?: any,
> ): void;
> [Symbol.iterator](): IterableIterator<FileSystemDirectoryHandleEntryType>;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemWritableFileStream)
> */
> export declare abstract class FileSystemWritableFileStream extends WritableStream {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemWritableFileStream/write) */
> write(
> data:
> | Blob
> | (ArrayBuffer | ArrayBufferView)
> | string
> | FileSystemWritableFileStreamWriteParams,
> ): Promise<void>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemWritableFileStream/seek) */
> seek(position: number): Promise<void>;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemWritableFileStream/truncate) */
> truncate(size: number): Promise<void>;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle)
> */
> export declare abstract class FileSystemSyncAccessHandle {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/read) */
> read(
> buffer: ArrayBuffer | ArrayBufferView,
> options?: FileSystemSyncAccessHandleFileSystemReadWriteOptions,
> ): number;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/write) */
> write(
> buffer: ArrayBuffer | ArrayBufferView,
> options?: FileSystemSyncAccessHandleFileSystemReadWriteOptions,
> ): number;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/truncate) */
> truncate(newSize: number): void;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/getSize) */
> getSize(): number;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/flush) */
> flush(): void;
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/FileSystemSyncAccessHandle/close) */
> close(): void;
> }
> /**
> * Available only in secure contexts.
> *
> * [MDN Reference](https://developer.mozilla.org/docs/Web/API/StorageManager)
> */
> export declare abstract class StorageManager {
> /* [MDN Reference](https://developer.mozilla.org/docs/Web/API/StorageManager/getDirectory) */
> getDirectory(): Promise<FileSystemDirectoryHandle>;
> }
> export interface FileSystemFileHandleFileSystemCreateWritableOptions {
> keepExistingData: boolean;
> }
> export interface FileSystemDirectoryHandleFileSystemGetFileOptions {
> create: boolean;
> }
> export interface FileSystemDirectoryHandleFileSystemGetDirectoryOptions {
> create: boolean;
> }
> export interface FileSystemDirectoryHandleFileSystemRemoveOptions {
> recursive: boolean;
> }
> export interface FileSystemSyncAccessHandleFileSystemReadWriteOptions {
> at?: number;
> }
> export interface FileSystemWritableFileStreamWriteParams {
> type: string;
> size?: number;
> position?: number;
> data?: Blob | (ArrayBuffer | ArrayBufferView) | string;
> }
> export interface FileSystemDirectoryHandleEntryType {
> key: string;
> value: FileSystemHandle; |
So this is a fun one. In preparation for the implementation of node:fs (and eventually also the Web file system API), we need to implement a virtual file system for the worker. Consider this a WIP sketch for the time being. There are still a range of things to do and verify before this is ready.
There's quite a bit to unpack here so let's go through it a bit.
First, let's talk about the file system itself. Every worker instance will have its own root directory (
/
). In this root directory we will have at least three special directories, the "bundle" root, the "dev" root, and the "temp" root.The bundle root is where all of the modules that are included in the worker bundle will be accessible. Everything in this directory will be strictly read-only. The contents are populated by the worker configuration bundle. By default, the bundle root will be
/bundle
but this can be overridden.The dev root provides the equivalent to
/dev/null
,/dev/zero
,/dev/full
, and/dev/random
.The temp root is where we will allow temporary files to be created. Everything in this directory is read-write but will be transient. By default, the temp root will be
/tmp
but this can also be overridden.Let's imagine the following simple workerd configuration:
Given this configuration, the worker fs will initially have the following structure:
We can then access the file system at the C++ level.
The temporary file directory is a bit special in that the contents are fully transient based on whether there is or is not an active IoContext. We use a special RAII
TmpDirStoreScope
scope to manage the contents of the temporary directory. For example,If there is an active IoContext, the temporary file wil be created within that IoContext's TmpDirStoreScope, and will be deleted when the IoContext is destructed. This allows us to have a single virtual file system whose temporary directories are either deleted immediately as soon as the execution scope is exited, or are specific to the IoContext and are deleted when the IoContext is destructed. This mechanism allows us to have multiple IoContexts active at the same time while still having a single virtual file system whose contents correctly reflect the current IoContext.
Temporary files can be created, copied, etc only within the temporary directory. All other directories are read-only.
When there is no active IoContext, all temporary files will have a timestamp set to the Unix epoch. When there is an active IoContext, the temporary files will acquire the current timestamp from the IoContext ensuring that the file system's view of time is consistent within a request.
The design here is intended to be extensible. We can add new root directories in the future with different semantics and implementations. For example, to support python workers we can introduce a new root directory that is backed, for instance, by a tar/zip file containing the python standard library, etc.
What isn't implemented yet? Great question! The following todos are remaining:
kj::none
for all file descriptors. As a next step, the file system will implement a custom file descriptor counter that will ensure that all files are assigned a unique monotonically increasing file descriptor. When the counter reaches a maximum threshold of opened files, the worker will be condemned. We can also just track the number of activekj::Own<const kj::File>
objects and use that internally for accounting but thenode:fs
impl does a lot with integer fds so we will need them at some point for that.The implementation is split across several files/components:
worker-fs.h/c++
- These provide the main interfaces for the worker file system.bundle-fs.h/c++
- These provide the interfaces for interfacing the workerd configuration with the worker file system. A similar interface will need to be provided for the internal repo since the two use different configuration schemas.