Allow SPs to move partitions between deadlines (FIP-0070) #735
Replies: 14 comments 29 replies
-
PR #740 is submitted for this. |
Beta Was this translation helpful? Give feedback.
-
Implementation wise, it seems that the move-able distance(in deadline unit) is quite limited: For a specific deadline to be move-able, it also has to satisfy the deadline_available_for_compaction condition, this function basically says that a deadline can't be moved during these three periods:
So for the 48 deadlines in a day, 32 deadlines are excluded by When move partitions from deadline A workaround is that before moving partitions from a source deadline, do a window post verify against the source deadline forcefully, thus avoiding the requirement of So there're two options:
The problem is: do we want both of these options or only prefer one of them? |
Beta Was this translation helpful? Give feedback.
-
I think a simple implementation might be possible, to just do verification of PoSt when |
Beta Was this translation helpful? Give feedback.
-
I've recently done some analysis on the load miner cron places on the system and have a few relevant results to share. These results get at the impact of a heightened concentration of partitions (i.e. majority within peak business hours in Asia) on network health. On the positive side the per cron job processing time is quite low in the common case. For example it is not infeasible to process non faulting cron jobs for the whole system in a single deadline. Another encouraging result is that there is a quite a lot of overhead for individual cron jobs: ~65M gas for deadlines with live partitions and a small marginal cost to adding a partition to a deadline: ~0.7M gas. If SPs tend to cluster their partitions the system is unlikely to break and will be more efficient in the happy path. On the more negative side faulting costs are expensive enough that high concentrations of partitions can cause significant problems at current network sizes. From the linked analysis
I think this is actually an ok parameter region to operate in. I'm a bit more concerned about the mid term future should the network scale upwards significantly. I'm also concerned about the possibility of even tighter concentrations of partitions in fewer epochs. @steven004 do you have predictions on the impact of this change on partition distribution across deadlines? Should we be concerned about large concentrations of partitions across ranges of fewer than, say, 60 epochs? |
Beta Was this translation helpful? Give feedback.
-
It would be good to do some benchmarking while implementing this FIP and see if |
Beta Was this translation helpful? Give feedback.
-
Now that SPs can move partitions, it is easier to pack up as many as sectors in one deadline. We should confirm if there is a upper limit for the # of partitions for a deadline under the current gas usage, to make sure DisputeWindowPoSt message can be successfully submitted - as there exists network security concern if a PoSt dispute messages fails due to constant ErrOutOfGas. |
Beta Was this translation helpful? Give feedback.
-
I don't see this in the proposal. I believe this FIP needs to let storage providers configure their miner to only assign new sectors to specific deadlines. Otherwise, this could turn into a whack-a-mole problem. |
Beta Was this translation helpful? Give feedback.
-
Product Consideration Storage services are expected to operate around the clock (e.g., to make data available). In theory, the current deadline assignment policy encourages storage providers to remain online at all times. However, there are some reasons this may not be the case:
I'd like to see this addressed/discussed in the FIP. |
Beta Was this translation helpful? Give feedback.
-
This is a bit of a concern as it could be used to create a bunch of empty partitions. The user will pay very little for them (no proofs necessary) but the system will pay in cron (although it might be negligible, I'm not sure). But I don't know of a great fix other than some kind of "forced compaction" mechanism (which, IMO, we should have anyways).
Related to above, I'm a bit concerned this could be used to create a bunch of sparse partitions (whose sectors could be subsequently terminated to create empty partitions). What if we, instead:
The alternative to both of these is to propose some mechanism where partition compaction would be required at some point. E.g., forbid moving partitions and/or sealing new sectors if any deadline has:
That would address all my concerns about "empty sectors", but it would force compaction which isn't cheap. |
Beta Was this translation helpful? Give feedback.
-
So, I missed something:
Every deadline is scheduled within 24 hours and, no matter what window we set here, there's nothing preventing someone from repeatedly moving partitions around the clock to never have to prove them. As far as I can tell, the only solution here is to "mark" moved partitions and/or deadlines with recently moved partitions, preventing repeated moves. |
Beta Was this translation helpful? Give feedback.
-
Request as part of last call @zhiqiangxu @steven004 - have you/your team tested/benchmark the miner cron impact/performance of under pessimistic situations? For example, assuming large amount of posts are scheduled in the same window; assuming aggressive amount of faults for the same window? In the fip - could you please also suggest what kind of monitoring should be setup for ensuring we catch any unexpected degradation post upgrade soon enough? Could you please also briefly capture the potential future work (safe cron/user paid cron) mentioned in this discussion in the fip so they don’t get lost? |
Beta Was this translation helpful? Give feedback.
-
We haven't done benchmark yet, just some functional test cases . I was discussing one technical issue about After the fix is confirmed, I'll add more test cases which @Stebalien proposed. I think the most important use case for this FIP is that SP can move the window post to office hours, which is about 8 hours instead of 24 hours. So I expect the congestion to be about 3x for a specific SP, but each SP can have a different choice, network wise, the final distribution depends on the physical locations of SPs. But even if all SPs are from the same location, the expected congestion is about 3x(assuming the office hour is 8 hours). There's no particular monitoring needed in my head, but we may want to test more edge cases. In terms of potential future work, I'll leave it for @steven004 . |
Beta Was this translation helpful? Give feedback.
-
During nv21 testing, @arajasek discovered there is critical implementation design bug in the proposed solution. Moving the slack thread here:
Anorth also suggested the following options accordingly:
From governance perspective, it might worth consideration to withdraw the acceptance of FIP0070 and revert it to |
Beta Was this translation helpful? Give feedback.
-
Providing this information to document the rationale behind the status of this FIP. As this FIP remains bugged, it has become unimplementable and therefore Core Devs have rejected this FIP for implementation. More information here - https://filecoinproject.slack.com/archives/C01EU76LPCJ/p1706041666706329 |
Beta Was this translation helpful? Give feedback.
-
This is a pre-FIP for discussion, inspired by discussion with @anorth and @nickle. Drafted by Venus team.
Summary
Add a new method for Miner actor to allow SPs to move a partition of sectors from one deadline to another, so that an SP could have control over their WindowedPoSt schedule.
Abstract
In WindowPoSt every 24-hour period is called a “proving period” and is broken down into a series of 30min, non-overlapping deadlines, making a total of 48 deadlines within any given 24-hour proving period. Each sector is assigned to one of these deadlines when proven to the chain by the network, grouped in partitions. The WindowPoSt of a particular partitions in a deadline has to be done in that particular 30min period every 24 hours.
This proposal is to add a method
MovePartitions
to move all sectors in one or more partitions from one deadline to another to allow SPs control the WindowPoSt time slot for sectors.Change Motivation
When a Storage Provider (SP) maintains storage for Filecoin network, they often encounter various challenges such as system performance degradation, network issues, facility maintenance, and even changes in data center location. While dealing with these challenges, SPs require greater flexibility in controlling the period to prove the data integrity to the chain, while still adhering to the rule of proving once every 24 hours.
By implementing this proposal, several advantages can be realized:
Specification
We propose adding the following method to the built-in
Miner
Actor.The params to this method indicate the partitions, their original deadline and destination deadline.
To adhere to the requirement of conducting a WindowPoSt every 24 hours, the
MovePartitions
function only permits the movement of partitions to aDestDeadline
whose next proving period is scheduled to occur within 24 hours after theOrigDeadline
's last proving period. If theDestDeadline
falls outside of this time frame, it will fail. This restriction ensures that the sector's period aligns with the required WindowPoSt interval.Design Rationale
The primary objective is to introduce a straightforward mechanism for implementing flexible proving period settings for Storage Providers (SPs). Additionally, there is a consideration to offer greater flexibility by allowing the movement of any selected sectors from one deadline to another. This would enable an SP to align the expiration of sectors within a single partition. However, this approach poses certain risks, such as the potential increase in the cost of managing the bitfield due to non-sequential sector IDs. Moreover, the technical complexity involved in understanding the cost calculation may lead to unnecessary difficulties for SP operators.
Backwards Compatibility
This proposal introduced a new method to the built-in actors, which needs a network upgrade, but otherwise breaks no backwards compatibility
Test Cases
Proposed test cases include:
‒ that the
MovePartitions
can be successful when moving one partition from a deadline to another which proving period is in 24 hours after the partition's last proving period;‒ that the
MovePartitions
would fail when moving one partition from a deadline to another which proving period is beyond 24 hours after the partition's last proving period;‒ that the
MovePartitions
can be successful when moving multiple partitions from a deadline to another which proving period is in 24 hours after partitions' last proving period;‒ that the
MovePartitions
would fail when moving multiple partitions from a deadline to another which proving period is beyond 24 hours after partitions' last proving period;‒ that the WindowedPoSt works as normal in the
DestDeadline
for partitions moved from itsOrigDeadline
;‒ that the
CompactPartitions
works as normal for the new moved-in partitions and existing partitions in theDestDeadline
Security Considerations
This proposal is to provide more flexibility for SPs to have full control of proving period of WindowPoSt for partitions. In the design, we still ask the SPs to follow the basic rule that one partition should have one proof every 24 hours, so there is no compromise in this regard.
There might be a concern that overall WindowPoSt messages in the network might imbalance over the 48 proving periods due to adjustments by SPs. Considering WindowPoSt only takes about 10% of the whole network bandwidth, and the network is decentralized, it should not be a problem, in addition, this mechanism actually provide a way for SPs to move proving period to avoid a expected congestion period.
Incentive Considerations
This FIP does not effect incentives in itself and is of only flexibility providing without any impact on economic factors.
Product Considerations
There is no any impact on Filecoin ecosystem application or platforms.
Implementation
An implementation can be provided after further discussion of the proposal.
Copyright
Copyright and related rights waived vis CC0.
Beta Was this translation helpful? Give feedback.
All reactions