Aggregator Deal Standard #866
Replies: 4 comments 2 replies
-
Note that the text above has veered slightly from new since the latest draft: https://www.notion.so/Aggregator-Deal-Standard-FRC-Draft-v0-1-eaf46d911e1045ec93b817c49d6e6cf2 This link represents the latest thoughts on the matter. Related links: |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I appreciate the detail that is present in this proposal, but I think it's lacking some context about who it's for, when they would use it, how widely this is expected to be adopted, scale etc. Understanding the motivation and the specific problem it aims to solve would also be very helpful. As an FRC that doesn't touch the core protocol, I'm open to diverse approaches: let a thousand flowers bloom. However, for this to become be a widely adopted standard, I think there are some issues. First let me sketch some context that I am assuming. Aggregation refers to the packing of multiple smaller pieces of data into a single on-chain "deal". This is useful because (a) many end users have small data, and (b) deals are expensive. The gas cost of publishing an on-chain deal for each smaller piece of data can be prohibitive. So an aggregator packs many small sub-pieces together into a single piece, provides its client/s with FRC-0058 PoDSI proofs, and transacts a single large deal with an SP. (End users are the aggregator's clients, but the aggregator is the SP's deal client). I understand the aggregator interface proposal to be essentially a mechanism for storing sub-deal information (i.e. the deals that are aggregated into on-chain deals) on chain. At least temporarily, while the aggregation is happening, but some methods also imply permanently. This raises some concerns. The essence of off-chain aggregation is to avoid the high costs of on-chain data storage. Storing sub-deal information on chain seems to remove the primary benefit of aggregation: that it's cheap by amortizing on-chain costs across many deals. The API appears to imply an index of Piece CID -> Deal on-chain, but I don't think this could reach any kind of large scale. Publishing per-deal information in the built-in market actor already consumes about half of the chain's bandwidth today. There is no room for growth and certainly no room to publish N>1 things per on-chain deal into chain state. Verifying inclusion proofs for all these items will be similarly prohibitive. I would question the significant advantage of this mechanism over just publishing on-chain deals for the small data items. Ok, the built-in market actor is quite inefficient and we need a better one. But after FIP-0076 and the provision of on-chain commitment notifications to user actors, it could well be more efficient to make a full on-chain deal with a slim user market actor than have aggregation verified on-chain (the miner can batch-verify all the piece inclusions more efficiently than a smart contract doing them one at a time). Even with more efficient on-chain market contracts, aggregation will remain relevant for cost amortization. We can expect continued technical progress on the core protocol and smart contracts to continue to drive deal costs down. Aggregation is based on the premise of an off-chain protocol seeking cost amortization over whatever the best possible on-chain deal is. Finally, I'm curious about the necessity of a blockchain for this process? Blockchains are good for a few very specific properties, and just very inefficient if you don't need them. Submitting a URL to an aggregator and receiving back a proof of inclusion in an on-chain deal seems like it could be done more simply? Can this be a simple point-to-point protocol? Or use a single computer to broker the information? What's the motivation for doing something on chain? Perhaps we can engage in a productive design discussion on how to solve that problem? (Payment for the work would be a good reason, but that's not addressed in the proposal). A few specific questions, which I think could be answered by expanding on motivation, context, expected use etc.
This doesn't seem to be referenced again throughout the spec. What's it for?
What third parties? Who wants or needs this? Why? Are they paying for it? Maintaining this on chain will be a great cost to users of the aggregator contract. |
Beta Was this translation helpful? Give feedback.
-
Some brief notes here based on a somewhat superficial look.
(I have not dug deep into the interface shape, types, naming choices, etc.) |
Beta Was this translation helpful? Give feedback.
-
Introduction
This proposal presents the
I*DataAggregator
interfaces for the Filecoin Virtual Machine (FVM) along with its reference implementation. This family of interfaces manages storage deal creation requests and stores relevant deal metadata, includingdeal_id
andprovider_id
.It also ensures data inclusion through PoDSI inclusion proof verification. PoDSI is described in FRC-58 Verifiable Data Aggregation, which allows aggregators to produce an Proof of Data Segment Inclusion (PoDSI) certifying that the client’s data is being properly aggregated. This proposal additionally supports Renew/Replication/Repair-as-a-Service (RaaS) full interface, where clients can specify flexible RaaS term deals with aggregator to define the renew period, replication factor, and repair threshold. Aggregator would then honor the provided deal terms with associated Storage providers, and perform the callback/emit event when the job is completed. More details on aggregator-implemented RaaS full interface workflow can be found here.
Standardizing these interfaces allows various aggregators to offer a set of uniform methods for submitting storage deal creation requests and storing the deal metadata. This enhances community portability across different aggregators and promotes compatibility among tools and libraries.
Note that this FRC can be built on both FVM mainnet as well as IPC subnets. The latter can be more gas efficient, but may not feature the variety of SPs available on SP mainnet. Over time, we expect that gas costs may push more SPs and builders for this standard to IPC subnets.
Note that currently this FRC does not cover paid deals, as well as managing escrow for paid deals. However we expect that future versions of this FRC may incorporate these flows.
Motivation / Context
This standard makes it possible for a client to transact with an aggregator in a trustless manner. That is, the user can be sure that the aggregated file contains the submitted subpiece through PoDSI (in the case of aggregators that implement
IOffchainDataAggregator
) or through commPa computation happening onchain (in the case of aggregators that implementIOnchainDataAggregator
). Through this trustless aggregation, this standard also enables richer interactions with Dapps, DataDAOs, and other onchain organizations present on FVM or IPC subnets. These can now store small pieces of data through FVM / IPC.The main audience of this standard is aggregators running in the PLN that want to leverage onchain primitives to trustlessly aggregate data. They should leverage this approach when they want to make their operations transparent to clients, SPs and other onchain indexers. They should also leverage this standard when they want to make their aggregation composable with other data organizations (such as Dapps and DataDAOs) being built on FVM / IPC. We expect substantial adoption of this standard across these aggregators.
Specification
IOffchainDataAggregator Interface
Outlined below is the
IOffchainDataAggregator
interface, which has to be implemented by each smart contract that stores aggregators’ deal information. Each aggregator can deploy their own smart contract to have full control, or multiple aggregators can share one smart contract and put their data under the same one.Aggregators who want to provide RaaS features should implement the
submitRaaS
and monitor theSubmitAggregatorRequestWithRaaS
as well. Aggregators should take the RaaS parameters in theSubmitAggregatorRequestWithRaaS
and register the corresponding RaaS functions for the deal.InclusionProof
,InclusionVerifierData
, andInclusionAuxData
are defined as follows (all the code related to verifying the PoDSI is provided in the reference implementation)IOnchainDataAggregator Interface
For building onchain aggregators (aggregators that compute CommPa within smart contract onchain) , the user interface is very similar. The only difference is the
complete()
callback function that aggregator calls: since the CommP aggregation happens within the onchain contract, PoDSI is entirely optional and thus the above fields (InclusionProof
,InclusionVerifierData
, andInclusionAuxData
) are not required.In this case, the function
onchain_complete
would only needs to includeuint256 _id
,uint64 _marketActorIduint64 _dealId
, anduint64 _providerId
fields, as shown below:IDataAggregatorEnumerable Interface
Outlined below is the
IDataAggregatorEnumerable
Interface, which can be optionally implemented by aggregators if they want to maintain a mapping of pieceCIDs and deals on chain. This makes the subpieces enumerable onchain and will allow third party observers of aggregators (chain indexers, block explorers and other SPs and clients interested in aggregation) to understand more in detail what data each aggregator is aggregating.Note: on FVM mainnet, maintaining this mapping onchain for a high volume aggregator can become unscalable. Therefore, we recommend implementing this optional interface either on an IPC subnet or if the aggregator does not have a large volume of deals that are being aggregated.
Reference Implementation
Reference implementation is provided here.
Rationale
The
I*DataAggregator
interfaces serve as a foundational blueprint for offchain aggregators in the Filecoin ecosystem. By prescribing a consistent set of methods and structures, we standardize the process of data submission across any aggregator that wants to create Filecoin storage deals.Specifically:
getAllDeals
function returns all thedealId
andproviderId
for a given CID. Users can utilize this information to check deal status via FVM or Filecoin APIs and perform necessary actions like repair or renewal.submit()
function to request the aggregation of specific data into a storage deal. The aggregator then actively monitors the event emitted by the smart contract, known asSubmitAggregatorRequest
. Once this event occurs, the aggregator begins the process of aggregating the data, which is represented by the CID, into an onchain deal.complete()
function. This critical step serves to verify that the data has been correctly included within the aggregated bundle. If any issues are detected during this verification process, thecomplete()
function reverts, ensuring that only deals that can be verified via PoDSI passes this step.CompleteAggregatorRequest
event emitted by thecomplete()
step associated with the correspondingdealId
. This event confirms the successful completion of the data aggregation process.Backwards Compatibility
This proposal introduces a new standard and does not alter or disrupt existing interfaces or implementations. However, offchain aggregators wishing to conform to this standard will need to adopt this interface.
Test Cases
Testing for the implementation of the
DataAggregator
interface should focus on:Data Retrieval
Aggregators maintain a copy of the data and serve it via an HTTP endpoint. Aggregators should provide an HTTP endpoint and a IPFS endpoint for users to retrieve the data for a certain period, e.g. 30 days. Client can make an HTTP call to the aggregator they uploaded the data to to download the file by providing the data’s CID (commPc).
Recommended API interface for the two retrieval methods mentioned above:
For detailed specification (http interface, header, responses etc.), please refer to this FRC regarding Filecoin piece retrieval gateway.
Security Considerations
Data to be aggregated is considered to be open. No security concerns at this time.
Copyright Waiver
Copyright and related rights waived via CC0.
Beta Was this translation helpful? Give feedback.
All reactions