Skip to content

Commit a933d01

Browse files
committed
Implement in-memory worker file system
So this is a fun one. In preparation for the implementation of node:fs (and eventually also the Web file system API), we need to implement a virtual file system for the worker. There's quite a bit to unpack here so let's go through it a bit. First, let's talk about the file system itself. Every worker instance will have its own root directory (`/`). In this root directory we will have at least two special directories, the "bundle" root and the "temp" root. * The bundle root is where all of the modules that are included in the worker bundle will be accessible. Everything in this directory will be strictly read-only. The contents are populated by the worker configuration bundle. By default, the bundle root will be `/bundle` but this can be overridden. * The temp root is where we will allow temporary files to be created. Everything in this directory is read-write but will be transient. By default, the temp root will be `/tmp` but this can also be overridden. Let's imagine the following simple workerd configuration: ``` const helloWorld :Workerd.Worker = ( modules = [ (name = "worker", esModule = embed "worker.js"), (name = "foo", text = "Hello World!"), ], compatibilityDate = "2023-02-28", ); ``` Given this configuration, the worker fs will initially have the following structure: ``` / ├── bundle │ ├── worker │ └── foo └── tmp ``` We can then use the `jsg::Lock&` to access the file system at the C++ level. We reuse the existing `kj::Filesystem` API to provide the interface. ```cpp jsg::Lock& js = ... // Resolve the root of the file system KJ_IF_SOME(node, js.resolveVfsNode("file:///"_url)) { auto& dir = kj::downcast<const kj::ReadableDirectory>( node); // List the contents of the directory KJ_DBG(dir.listNames()); } KJ_IF_SOME(node, js.resolveVfsNode( "file:///bundle/worker"_url)) { auto& file = kj::downcast<const kj::ReadableFile>(node); KJ_DBG(file.readAllText()); } KJ_IF_SOME(node, js.resolveVfsNode("file:///tmp"_url)) { auto& dir = kj::downcast<const kj::Directory>(node); auto tmpFile = dir.createTemporary(); // tmpFile is an anonymous temporary file } ``` The temporary file directory is a bit special in that the contents are fully transient based on whether there is or is not an active IoContext. We use a special RAII `TempDirStoreScope` scope to manage the contents of the temporary directory. For example, ```cpp KJ_IF_SOME(node, js.resolveVfsNode("file:///tmp"_url)) { auto& dir = kj::downcast<const kj::Directory>(node); kj::Path path("a/b/c/foo.txt") { TempDirStoreScope temp_dir_scope; auto tmpFile = dir.openFile(path, kj::FileMode::CREATE); KJ_ASSERT(tmpFile.write(0, "Hello World!"_kjb) == 12); KJ_DBG(dir.exists(path)); // true! } // The temp dir scope is destructed and the file // is deleted KJ_DBG(dir.exists(path)); // false! } ``` However, if there is an active IoContext, the temporary file will instead be created within that IoContext's TempDirStoreScope, and will be deleted when the IoContext is destructed. This allows us to have a single virtual file system whose temporary directories are either deleted immediately as soon as the execution scope is exited, or are specific to the IoContext and are deleted when the IoContext is destructed. This mechanism allows us to have multiple IoContexts active at the same time while still having a single virtual file system whose contents correctly reflect the current IoContext. Temporary files can be created, copied, etc only within the temporary directory. All other directories are read-only. When there is no active IoContext, all temporary files will have a timestamp set to the Unix epoch. When there is an active IoContext, the temporary files will acquire the current timestamp from the IoContext ensuring that the file system's view of time is consistent within a request. The design here is intended to be extensible. We can add new root directories in the future with different semantics and implementations. For example, to support python workers we can introduce a new root directory that is backed, for instance, by a tar/zip file containing the python standard library, etc. What isn't implemented yet? Great question! The following todos are remaining: * Implementing memory accounting for the temporary file system. Currently the implementation is using the default in-memory file factory provided by kj. This is not ideal as it doesn't provide any mechanisms for memory accounting that integrates with the Isolate heap limits. As a next step, before this work is complete, we will need to implement a custom in-memory file factory that does integrate with the memory accounting system. The intention is that the total size of the temporary files will be limited to the Isolate heap limit. * Implmenting "file descriptors". Currently the file system will return `kj::none` for all file descriptors. As a next step, the file system will implement a custom file descriptor counter that will ensure that all files are assigned a unique monotonically increasing file descriptor. When the counter reaches a maximum threshold of opened files, the worker will be condemned. We can also just track the number of active `kj::Own<const kj::File>` objects and use that internally for accounting but the `node:fs` impl does a lot with integer fds so we will need them at some point for that. The implementation currently does not support symbolic links in any way. I do not anticipate implementing this in the future. The implementation is split across several files/components: `worker-fs.h/c++` - These provide the main interfaces for the worker file system. `bundle-fs.h/c++` - These provide the interfaces for interfacing the workerd configuration with the worker file system. A similar interface will need to be provided for the internal repo since the two use different configuration schemas. The integration via the `jsg::Lock&` is provided as the most convenient way to access the file system where it is needed.
1 parent a44255f commit a933d01

23 files changed

+1872
-156
lines changed

src/workerd/api/tests/new-module-registry-test.js

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ import { foo as foo2, default as def2 } from 'bar';
1313
import { createRequire } from 'module';
1414

1515
// Verify that import.meta.url is correct here.
16-
strictEqual(import.meta.url, 'file:///worker');
16+
strictEqual(import.meta.url, 'file:///bundle/worker');
1717

1818
// Verify that import.meta.main is true here.
1919
ok(import.meta.main);
@@ -44,7 +44,8 @@ console.log(import.meta);
4444
// Verify that import.meta.resolve provides correct results here.
4545
// The input should be interpreted as a URL and normalized according
4646
// to the rules in the WHATWG URL specification.
47-
strictEqual(import.meta.resolve('./.././test/.././../foo'), 'file:///foo');
47+
strictEqual(import.meta.resolve('./.././test/.././.%2e/foo'), 'file:///foo');
48+
strictEqual(import.meta.resolve('foo'), 'file:///bundle/foo');
4849

4950
// There are four tests at this top level... one for the import of the node:assert
5051
// module without the node: prefix specifier, two for the imports of the foo and
@@ -58,7 +59,7 @@ strictEqual(def2, 2);
5859
strictEqual(fs, 'abc');
5960

6061
// Equivalent to the above, but using the file: URL scheme.
61-
import { foo as foo3, default as def3 } from 'file:///foo';
62+
import { foo as foo3, default as def3 } from 'file:///bundle/foo';
6263
strictEqual(foo, foo3);
6364
strictEqual(def, def3);
6465

@@ -78,7 +79,7 @@ import { default as cjs2 } from 'cjs2';
7879
strictEqual(cjs2.foo, 1);
7980
strictEqual(cjs2.bar, 2);
8081
strictEqual(cjs2.filename, 'cjs1');
81-
strictEqual(cjs2.dirname, '/');
82+
strictEqual(cjs2.dirname, '/bundle');
8283
strictEqual(cjs2.assert, assert);
8384

8485
// CommonJS modules can define named exports.
@@ -91,7 +92,7 @@ const customRequireCjs = myRequire('cjs1');
9192
strictEqual(customRequireCjs.foo, cjs1foo);
9293
strictEqual(customRequireCjs.bar, cjs1bar);
9394

94-
await rejects(import('file:///cjs3'), {
95+
await rejects(import('file:///bundle/cjs3'), {
9596
message: 'boom',
9697
});
9798

@@ -109,7 +110,7 @@ await rejects(import('invalid-json'), {
109110
});
110111

111112
await rejects(import('module-not-found'), {
112-
message: /Module not found: file:\/\/\/module-not-found/,
113+
message: /Module not found: file:\/\/\/bundle\/module-not-found/,
113114
});
114115

115116
// Verify that a module is unable to perform IO operations at the top level, even if
@@ -160,10 +161,10 @@ export const queryAndFragment = {
160161
notStrictEqual(c, d);
161162

162163
// The import.meta.url for each should match the specifier used to import the instance.
163-
strictEqual(a.bar, 'file:///foo?query');
164-
strictEqual(b.bar, 'file:///foo#fragment');
165-
strictEqual(c.bar, 'file:///foo?query#fragment');
166-
strictEqual(d.bar, 'file:///foo');
164+
strictEqual(a.bar, 'file:///bundle/foo?query');
165+
strictEqual(b.bar, 'file:///bundle/foo#fragment');
166+
strictEqual(c.bar, 'file:///bundle/foo?query#fragment');
167+
strictEqual(d.bar, 'file:///bundle/foo');
167168
},
168169
};
169170

src/workerd/io/BUILD.bazel

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ wd_cc_library(
5252
"io-util.c++",
5353
"tracer.c++",
5454
"worker.c++",
55+
"worker-fs.c++",
5556
] + ["//src/workerd/api:srcs"],
5657
hdrs = [
5758
"compatibility-date.h",
@@ -64,6 +65,7 @@ wd_cc_library(
6465
"promise-wrapper.h",
6566
"tracer.h",
6667
"worker.h",
68+
"worker-fs.h",
6769
] + ["//src/workerd/api:hdrs"],
6870
# global CPU-specific options are inconvenient to specify with bazel – just set the options we
6971
# need for CRC32C in this target for now.
@@ -368,3 +370,10 @@ kj_test(
368370
":frankenvalue",
369371
],
370372
)
373+
374+
kj_test(
375+
src = "worker-fs-test.c++",
376+
deps = [
377+
":io",
378+
],
379+
)

src/workerd/io/io-context.c++

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,8 @@ IoContext::IoContext(ThreadContext& thread,
131131
kj::Maybe<Worker::Actor&> actorParam,
132132
kj::Own<LimitEnforcer> limitEnforcerParam)
133133
: thread(thread),
134+
clock(*this),
135+
tempDirStoreScope(TempDirStoreScope::create(clock)),
134136
worker(kj::mv(workerParam)),
135137
actor(actorParam),
136138
limitEnforcer(kj::mv(limitEnforcerParam)),

src/workerd/io/io-context.h

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include <workerd/io/io-thread-context.h>
1515
#include <workerd/io/io-timers.h>
1616
#include <workerd/io/trace.h>
17+
#include <workerd/io/worker-fs.h>
1718
#include <workerd/jsg/async-context.h>
1819
#include <workerd/jsg/jsg.h>
1920
#include <workerd/util/exception.h>
@@ -224,6 +225,20 @@ class IoContext final: public kj::Refcounted, private kj::TaskSet::ErrorHandler
224225
public:
225226
class TimeoutManagerImpl;
226227

228+
class IoContextClock: public kj::Clock {
229+
public:
230+
IoContextClock(IoContext& context): context(context) {}
231+
kj::Date now() const override {
232+
if (context.isCurrent()) {
233+
return context.now();
234+
}
235+
return kj::UNIX_EPOCH;
236+
}
237+
238+
private:
239+
IoContext& context;
240+
};
241+
227242
// Construct a new IoContext. Before using it, you must also create an IncomingRequest.
228243
IoContext(ThreadContext& thread,
229244
kj::Own<const Worker> worker,
@@ -623,6 +638,15 @@ class IoContext final: public kj::Refcounted, private kj::TaskSet::ErrorHandler
623638
// Access the event loop's current time point. This will remain constant between ticks.
624639
kj::Date now();
625640

641+
// Returns a kj::Clock that returns the same value as now().
642+
const kj::Clock& getClock() const {
643+
return clock;
644+
}
645+
646+
const TempDirStoreScope& getTempDirStoreScope() const {
647+
return *tempDirStoreScope;
648+
}
649+
626650
// Returns a promise that resolves once `now() >= when`.
627651
kj::Promise<void> atTime(kj::Date when) {
628652
return getIoChannelFactory().getTimer().atTime(when);
@@ -839,6 +863,9 @@ class IoContext final: public kj::Refcounted, private kj::TaskSet::ErrorHandler
839863

840864
kj::Own<WeakRef> selfRef = kj::refcounted<WeakRef>(kj::Badge<IoContext>(), *this);
841865

866+
IoContextClock clock;
867+
kj::Own<TempDirStoreScope> tempDirStoreScope;
868+
842869
kj::Own<const Worker> worker;
843870
kj::Maybe<Worker::Actor&> actor;
844871
kj::Own<LimitEnforcer> limitEnforcer;

src/workerd/io/worker-fs-test.c++

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
#include "worker-fs.h"
2+
3+
#include <kj/debug.h>
4+
#include <kj/test.h>
5+
6+
namespace workerd {
7+
namespace {
8+
9+
kj::Own<FsMap> createTestFsMap() {
10+
FsMap::Builder builder;
11+
builder.setPathForKnownRoot(FsMap::KnownRoot::BUNDLE, "/mything/bundle");
12+
builder.setPathForKnownRoot(FsMap::KnownRoot::TEMP, "/mything/temp");
13+
return builder.finish();
14+
}
15+
16+
KJ_TEST("FsMap") {
17+
auto fsMap = createTestFsMap();
18+
19+
// Check that the paths are correct.
20+
KJ_EXPECT(
21+
fsMap->getPathForKnownRoot(FsMap::KnownRoot::BUNDLE) == kj::Path({"mything", "bundle"}));
22+
23+
KJ_EXPECT(fsMap->getPathForKnownRoot(FsMap::KnownRoot::TEMP) == kj::Path({"mything", "temp"}));
24+
25+
KJ_EXPECT(
26+
fsMap->getUrlForKnownRoot(FsMap::KnownRoot::BUNDLE).equal("file:///mything/bundle"_url));
27+
28+
KJ_EXPECT(fsMap->getUrlForKnownRoot(FsMap::KnownRoot::TEMP).equal("file:///mything/temp"_url));
29+
30+
// Check that the known root is correct.
31+
auto knownRoot = fsMap->tryGetKnownRootForUrl("file:///mything/bundle"_url);
32+
KJ_EXPECT(knownRoot == FsMap::KnownRoot::BUNDLE);
33+
34+
knownRoot = fsMap->tryGetKnownRootForUrl("file:///mything/temp"_url);
35+
KJ_EXPECT(knownRoot == FsMap::KnownRoot::TEMP);
36+
37+
knownRoot = fsMap->tryGetKnownRootForPath(kj::Path({"mything", "bundle"}));
38+
KJ_EXPECT(knownRoot == FsMap::KnownRoot::BUNDLE);
39+
40+
knownRoot = fsMap->tryGetKnownRootForPath(kj::Path({"mything", "temp"}));
41+
KJ_EXPECT(knownRoot == FsMap::KnownRoot::TEMP);
42+
43+
KJ_EXPECT(fsMap->tryGetKnownRootForPath(kj::Path({"does", "not", "exist"})) == kj::none);
44+
KJ_EXPECT(fsMap->tryGetKnownRootForUrl("http://not/supported"_url) == kj::none);
45+
}
46+
47+
KJ_TEST("TempDirStoreScope") {
48+
// We can create multiple temp storages on the heap...
49+
auto tempStoreOnHeap = TempDirStoreScope::create(UnixEpochClock::instance);
50+
auto tempStoreOnHeap2 = TempDirStoreScope::create(UnixEpochClock::instance);
51+
52+
KJ_EXPECT(!TempDirStoreScope::hasCurrent());
53+
54+
{
55+
// But we can only have one on the stack at a time per thread.
56+
TempDirStoreScope tempDirStoreScope;
57+
KJ_EXPECT(TempDirStoreScope::hasCurrent());
58+
KJ_ASSERT(&TempDirStoreScope::current() == &tempDirStoreScope);
59+
KJ_ASSERT(&TempDirStoreScope::current() != tempStoreOnHeap.get());
60+
KJ_ASSERT(&TempDirStoreScope::current() != tempStoreOnHeap2.get());
61+
62+
auto temp = tempDirStoreScope.getDirectory().createTemporary();
63+
temp->write(0, kj::arr<kj::byte>(0, 1, 2, 3, 4));
64+
KJ_EXPECT(temp->stat().lastModified == kj::UNIX_EPOCH);
65+
KJ_EXPECT(temp->stat().size == 5);
66+
}
67+
}
68+
69+
KJ_TEST("WorkerFileSystem") {
70+
auto fsMap = createTestFsMap();
71+
72+
// In a real worker, the bundle delegate is created from the
73+
// worker configuration. Here we just create a dummy one using
74+
// the temp directory delegate.
75+
TempDirStoreScope tempDirStoreScope;
76+
auto bundle = getTempDirectoryDelegate();
77+
KJ_ASSERT_NONNULL(bundle->tryOpenFile(kj::Path("foo"), kj::WriteMode::CREATE));
78+
79+
auto workerFs = getWorkerFileSystem(*fsMap, kj::mv(bundle));
80+
81+
auto& root = workerFs->getRoot();
82+
auto metadata = root.stat();
83+
KJ_EXPECT(metadata.type == kj::FsNode::Type::DIRECTORY);
84+
KJ_EXPECT(metadata.size == 0);
85+
86+
auto names = root.listNames();
87+
KJ_EXPECT(names.size() == 1);
88+
KJ_EXPECT(names[0] == "mything");
89+
90+
auto entries = root.listEntries();
91+
KJ_EXPECT(entries.size() == 1);
92+
KJ_EXPECT(entries[0].type == kj::FsNode::Type::DIRECTORY);
93+
KJ_EXPECT(entries[0].name == "mything");
94+
95+
KJ_EXPECT(root.exists(kj::Path({"mything"})));
96+
KJ_EXPECT(root.exists(kj::Path({"mything", "bundle"})));
97+
KJ_EXPECT(root.exists(kj::Path({"mything", "temp"})));
98+
KJ_EXPECT(root.exists(kj::Path({"mything", "bundle", "foo"})));
99+
100+
metadata = root.lstat(kj::Path({"mything"}));
101+
KJ_EXPECT(metadata.type == kj::FsNode::Type::DIRECTORY);
102+
KJ_EXPECT(metadata.size == 0);
103+
104+
metadata = root.lstat(kj::Path({"mything", "bundle"}));
105+
KJ_EXPECT(metadata.type == kj::FsNode::Type::DIRECTORY);
106+
KJ_EXPECT(metadata.size == 0);
107+
108+
metadata = root.lstat(kj::Path({"mything", "bundle", "foo"}));
109+
KJ_EXPECT(metadata.type == kj::FsNode::Type::FILE);
110+
KJ_EXPECT(metadata.size == 0);
111+
112+
metadata = root.lstat(kj::Path({"mything", "temp"}));
113+
KJ_EXPECT(metadata.type == kj::FsNode::Type::DIRECTORY);
114+
KJ_EXPECT(metadata.size == 0);
115+
116+
auto subdir = root.openSubdir(kj::Path({"mything", "bundle"}));
117+
KJ_EXPECT(subdir->stat().type == kj::FsNode::Type::DIRECTORY);
118+
}
119+
120+
} // namespace
121+
} // namespace workerd

0 commit comments

Comments
 (0)