-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow block_size/block_count to be optional, add erase_count, lfs_fs_stat, and lfs_fs_grow #753
base: test-and-bench-runners
Are you sure you want to change the base?
Conversation
aa23e01
to
96cb73e
Compare
I expressed some concern about additional RAM usage in #349 - it could (at least at first glance) be a significant amount in the context of a small to medium processor that doesn't need the RAM-based config block. |
Don't mind me just copying the CI output here. I don't have a strong opinion yet on if this goes in or not: Tests passed ✓, Code: 16334 B (+4.0%), Stack: 1344 B (+3.1%), Structs: 812 B (+5.2%)
|
78ebd80
to
d677a96
Compare
d677a96
to
2d2dd8b
Compare
881a4c6
to
5167c2d
Compare
076f871
to
17c9665
Compare
17c9665
to
1f37eb5
Compare
This adds two new configuration options: erase_size and erase_count, allowing block_size and block_count to be loaded from the superblock during mount. For backwards compability these default to block_size and block_count if zero. --- Unfortunately this is a bit easier said than done. littlefs keeps its superblock in the metadata pair located at blocks {0,1}, which is also where the root directory lives (keep in mind small littlefs images may only have 2 blocks in total). If we mutate blocks {0,1}, we have to erase before programming, which means it's possible to have the only superblock in block 1. This presents a puzzle because how do you find block 1 if you don't know the size of a block? One solution presented here is to search for block 1 by trying different sizes until we find a superblock. This isn't great but there are some properties of this search that help: 1. If we do find a superblock, the search will never take longer than a mount with a known block_size. This is because we stop at block 1, searching at most O(block_size) bytes, and metadata fetch is already a O(block_size) operation. This means the concern is limited to how long it takes to fail when littlefs is not present on the disk. 2. We can assume the on-disk block_size is probably a factor of the total size of the disk. After a bit of digging into the math, this apparently reduces the runtime to the divisor function, d(n), which is sublinear. According to a blog post by Terence Tao this is bounded by the ridiculous O(e^O(log(n)/log(log(n)))): https://terrytao.wordpress.com/2008/09/23/the-divisor-bound This is apparently somewhere between O(sqrt(n)) and O(log(n)), but conveniently O(log(n)) on average and O(log(n)) for powers of 2. I've left it as O(d(n)) in the documentation, which might be a bit confusing, but I'm not sure how best to capture "mostly log(n)" correctly. 3. If we don't know the block_size, or don't know that block_size is aligned to the disk size, the best we can do is a O(n) search. In this case I've added a warning, so at least it's distinguishable from an infinite loop if debugging.
This hopefully helps make it more clear that we are using erase_size as the unit during the block_size search. However this does mean you need to set erase_count to zero to allow any littlefs size, which might be a bit confusing: block_size block_count erase_size erase_count known known known known => known bs, O(1) known unknown known known => known bs, O(1) unknown unknown known known => bs search, O(d(n)) unknown unknown known unknown => bs search, O(n)
Aside from asserts (which can be implemented in block devices), erase_count provides no useful info aside from what's known with block_count and risks mistakes during configuration. Each set of known/unknown block_size-related variables has a well-defined mount behavior: block_size block_count known known => known bs, O(1) known unknown => known bs, O(1) unknown known => bs search, O(d(n)) unknown unknown => bs search, O(n) If block_size is unknown, block_count uses the erase_size as a unit, which is the only reasonable option since block_size must be >= prog/read/cache_size.
This ended up needing a bit of API rework since littlefs no longer needs to know the actual erase_count. We can no longer rely on lfs_config to contain all the information necessary to configure the block devices. Changing the lfs_bd_config structs to be required is probably a good idea anyways as it moves us more towards separating the bds from littlefs, though we can't quite get rid of the lfs_config parameter because of the block-device API in lfs_config. Eventually it would be nice to get rid of it, but that would require API breakage.
This is necessary for accessing format-specific information stored in the superblock after mounting. As is common in other filesystems this also provides the filesystem usage. However collecting the filesystem usage is more expensive in littlefs than other filesystem (and less accurate because of CoW), so this may need to be cached after mount in the future.
- Need to drop cache during block_size search. - Removed "Corrupted" messages when root is not found to avoid too much debug output. This also gets rid of a long-term point of confusion arround "Corrupted" messages on a mount that is intended to fail.
The initial implementation for this was provided by kaetemi, originally as a mount flag. However, it has been modified here to be self-contained in an explicit runtime function that can be called after mount. The reasons for an explicit function: 1. lfs_mount stays a strictly readonly operation, and avoids pulling in all of the write machinery. 2. filesystem-wide operations such as lfs_fs_grow can be a bit risky, and irreversable. The action of growing the filesystem should be very intentional. --- One concern with this change is that this will be the first function that changes metadata in the superblock. This might break tools that expect the first valid superblock entry to contain the most recent metadata.
These locally show the expected O(d(n)) growth of the block_size search when block_count is known. A rough estimate of runtime for a 1 GiB device with a 90 MiB/s read bandwidth (using the mt25q for rough numbers): known bs/bc => ~1.44 us known block_size => ~1.44 us known block_count => ~125 us unknown bs/bc => ~376 ms
- Valgrind-detected memory leak in test_superblocks. - Valgrind-detected uninitialized lfs->block_size in lfs_init caused by an unintended sed replacement. - Said sed replacement also changed the implementation of the v1 portion of lfs_migrate, this is out of scope and caused migrate to fail, which fortunately CI noticed. - We also need to copy the block_size/block_count configs in v1 so that low-level bd operations still work. - le-conversion in test_evil now matters due to difference in LFS_ERR_CORRUPT/LFS_ERR_INVAL paths during lfs_mount. This is good because we're actually testing for loop detection not pointer out-of-bounds as intended.
Just an FYI for anyone watching this PR, at the moment I'm considering this PR defunct.
The only remaining feature is the detection of unknown block-sizes, but after taking some time away from the problem, I'm not sure detecting unknown block-sizes based on the assumption of the total disk size being divisible is very reasonable in practice. And as @e107steved notes above, it's possible to do the naive search for block-size outside of the filesystem, so I will be leaving this feature out for now. At the moment it looks like block-size will always be a required configuration option. |
This PR moves block_size/block_count into RAM, allowing more flexibility during
lfs_mount
.The main benefit is allowing block_size and block_count to be marked as "unknown", using the newly added erase_count to read block_size and block_count from the superblock during lfs_mount.
Unfortunately this isn't as simple as it sounds. In littlefs, as a part of enabling atomic updates, the superblock can live in either block 0 or block 1. But if we don't know the block_size, we don't really know where block 1 begins. This can put us in a catch-22 situation where we need the block_size in order to read the block_size from the superblock.
The solution proposed here is to search possible block_sizes until we find the superblock, optionally using the size of the disk to narrow down our search. Surprisingly, this adds no runtime if we find a littlefs image, but unfortunately has some complicated performance implications if the disk in unformatted:
If we know the block_size and block_count, we can easily mount the filesystem, erroring with LFS_ERR_INVAL if the block_size/block_count stored in the superblock are different. Worst case$O(1)$ .
If we know the block_size, we can easily mount the filesystem and read the block_count from the superblock. Worst case$O(1)$ .
If we don't know the block_size, we need to search for it. Two shortcuts: 1. we know the block_size must be a multiple of the erase_size, and 2. if we know the block_count, we can assume the block_size divides the block_count evenly. Worst case$O(d(n))$ .
If we don't know the block_count, or can't rely on the above assumption, the best we can do is search all multiples of the erase_size. This risks taking a long time on large, unformatted disks. Worst case$O(n)$ .
The above function,$d(n)$ , is the "divisor function", and it's analysis is a bit difficult. On average, and for all powers-of-two, it runs in $O(\log n)$ . But there are so called "highly composite numbers" where it can take at worst $O(\sqrt n)$ .
#349 has more discussion about this scheme and alternatives.
I put together a quick benchmark to measure how long
lfs_mount
takes to fail as the disk grows. We can see the sub-linear runtime if we know the block_count, and linear runtime if both block_size and block_count are unknown. If we know the block_size,lfs_mount
reads only ~136 bytes:Choosing a random device, mt25q, shows a max of ~90MiB/s reads in the datasheet. So for a very rough estimate:
1 GiB @ 90 MiB/s
With block_size/block_count now variable, this PR also includes a couple relevant new functions:
lfs_fs_stat
lfs_fs_stat
allows you to access the configuration read from littlefs's superblock during mount, as well as other state such as the number of blocks in use (susceptible to over-counting due to CoW).This is supposed to mirror statvfs common in other filesystems.
lfs_fs_grow
First implemented by @kaetemi,
lfs_fs_grow
allows you to change the size of the filesystem on disk to take advantage of additional space if it becomes available:If you want to automatically resize littlefs to the full disk size every mount, this can be done by using these two calls as your "mount" operation:
Relevant discussions:
lfs_fs_grow
LFS_M_GROW
Note this PR is currently under discussion in #349.