Aggregator Deal Standard #866

honghaoq · 2023-11-15T23:39:39Z

honghaoq
Nov 15, 2023

Introduction

This proposal presents the I*DataAggregator interfaces for the Filecoin Virtual Machine (FVM) along with its reference implementation. This family of interfaces manages storage deal creation requests and stores relevant deal metadata, including deal_id and provider_id.
It also ensures data inclusion through PoDSI inclusion proof verification. PoDSI is described in FRC-58 Verifiable Data Aggregation, which allows aggregators to produce an Proof of Data Segment Inclusion (PoDSI) certifying that the client’s data is being properly aggregated. This proposal additionally supports Renew/Replication/Repair-as-a-Service (RaaS) full interface, where clients can specify flexible RaaS term deals with aggregator to define the renew period, replication factor, and repair threshold. Aggregator would then honor the provided deal terms with associated Storage providers, and perform the callback/emit event when the job is completed. More details on aggregator-implemented RaaS full interface workflow can be found here.

Standardizing these interfaces allows various aggregators to offer a set of uniform methods for submitting storage deal creation requests and storing the deal metadata. This enhances community portability across different aggregators and promotes compatibility among tools and libraries.

Note that this FRC can be built on both FVM mainnet as well as IPC subnets. The latter can be more gas efficient, but may not feature the variety of SPs available on SP mainnet. Over time, we expect that gas costs may push more SPs and builders for this standard to IPC subnets.

Note that currently this FRC does not cover paid deals, as well as managing escrow for paid deals. However we expect that future versions of this FRC may incorporate these flows.

Motivation / Context

This standard makes it possible for a client to transact with an aggregator in a trustless manner. That is, the user can be sure that the aggregated file contains the submitted subpiece through PoDSI (in the case of aggregators that implement IOffchainDataAggregator) or through commPa computation happening onchain (in the case of aggregators that implement IOnchainDataAggregator). Through this trustless aggregation, this standard also enables richer interactions with Dapps, DataDAOs, and other onchain organizations present on FVM or IPC subnets. These can now store small pieces of data through FVM / IPC.

The main audience of this standard is aggregators running in the PLN that want to leverage onchain primitives to trustlessly aggregate data. They should leverage this approach when they want to make their operations transparent to clients, SPs and other onchain indexers. They should also leverage this standard when they want to make their aggregation composable with other data organizations (such as Dapps and DataDAOs) being built on FVM / IPC. We expect substantial adoption of this standard across these aggregators.

Specification

IOffchainDataAggregator Interface

Outlined below is the IOffchainDataAggregator interface, which has to be implemented by each smart contract that stores aggregators’ deal information. Each aggregator can deploy their own smart contract to have full control, or multiple aggregators can share one smart contract and put their data under the same one.

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.17;

/**
 * @title IOffchainDataAggregator Interface
 * @notice Interface for an aggregator oracle on the Filecoin Virtual Machine (FVM)
 */
interface IOffchainDataAggregator {
    
    /**
     * @notice A structure referensing a deal representation in the Filecoin network
     */
    struct Deal {
        uint64 dealId;   // A unique identifier for the deal.
        uint64 providerId;  // The storage provider that is storing the data for the deal.
	uint64 marketActorId; // The actor ID of the storage market in which the deal representation is stored. dealId is scoped by marketActorId
    }

    /**
     * @notice Emitted when a new request is submitted with an ID and content identifier (CID)
     * @param id Identifier for the request
     * @param cid Content identifier for the request
     */
    event SubmitAggregatorRequest(uint256 indexed id, bytes cid);

    /**
     * @notice Emitted when a new request is submitted with an ID, content identifier (CID), and RaaS parameters
     * @param id Identifier for the request
     * @param cid Content identifier for the request
     * @param _replication_target The number of copies this file should be replicated to
     * @param _repair_threshold The number of epochs the deal should be repaired after being inactive for
     * @param _renew_threshold The number of epochs the deal should be renewed before its expiration date
     */
    event SubmitAggregatorRequestWithRaaS(uint256 indexed id, bytes cid, uint256 _replication_target, uint256 _repair_threshold, uint256 _renew_threshold);

    /**
     * @notice Emitted when a request is completed, marking it with the request ID and deal ID
     * @param id Identifier for the request
     * @param dealId Identifier for the deal associated with the request
     */
    event CompleteAggregatorRequest(uint256 indexed id, uint64 indexed dealId);

    /**
     * @notice Function to submit a new file (as denominated by a piece CID) to the aggregator
     * @param _cid The piece CID to be stored
     * @param _fetchLink A link to he piece CID to allow the aggregators to fetch the data in the piece CID.
     * @return The identifier for the submitted request
     */
    function submit(bytes memory _cid, bytes memory _fetchLink) external returns (uint256);

    /**
     * @notice Function to submit a new file to the aggregator, specifing the raas parameters
     * @param _cid The piece CID of the file to be stored
     * @param _replication_target The number of copies this file should be replicated to
     * @param _repair_threshold The number of epochs the deal should be repaired after being inactive for
     * @param _renew_threshold The number of epochs the deal should be renewed before its expiration date
     * @return 
     */
	function submitRaaS(bytes memory _cid, bytes memory _fetchLink, uint256 memory _replication_target, uint256 memory _repair_threshold, uint256 memory _renew_threshold);

    /**
     * @notice Callback function that is called by the aggregator once data has been stored
     * @param _id The identifier for the request
     * @param _dealId The deal's unique identifier
     * @param _providerId The provider's unique identifier
     * @param _proof Inclusion proof for the stored data segment
     * @param _verifierData Additional data needed to verify the inclusion proof
     * @return Additional auxiliary data
     */
    function complete(
        uint256 _id,
	uint64 _marketActorId,
        uint64 _dealId,
        uint64 _providerId,
        InclusionProof memory _proof,
        InclusionVerifierData memory _verifierData
    ) external returns (InclusionAuxData memory);
}

Aggregators who want to provide RaaS features should implement the submitRaaS and monitor the SubmitAggregatorRequestWithRaaS as well. Aggregators should take the RaaS parameters in the SubmitAggregatorRequestWithRaaS and register the corresponding RaaS functions for the deal.

InclusionProof , InclusionVerifierData, and InclusionAuxData are defined as follows (all the code related to verifying the PoDSI is provided in the reference implementation)

// ProofData is a Merkle proof
struct ProofData {
    uint64 index;
    bytes32[] path;
}

// InclusionPoof is produced by the aggregator (or possibly by the SP)
struct InclusionProof {
    // ProofSubtree is proof of inclusion of the client's data segment in the data aggregator's Merkle tree (includes position information)
    // I.e. a proof that the root node of the subtree containing all the nodes (leafs) of a data segment is contained in CommDA
    ProofData proofSubtree;
    // ProofIndex is a proof that an entry for the user's data is contained in the index of the aggregator's deal.
    // I.e. a proof that the data segment index constructed from the root of the user's data segment subtree is contained in the index of the deal tree.
    ProofData proofIndex;
}

// InclusionVerifierData is the information required for verification of the proof and is sourced
// from the client.
struct InclusionVerifierData {
    // Piece Commitment CID to client's data
    bytes commPc;
    // SizePc is size of client's data
    uint64 sizePc;
}

// InclusionAuxData is required for verification of the proof and needs to be cross-checked with the chain state
struct InclusionAuxData {
    // Piece Commitment CID to aggregator's deal
    bytes commPa;
    // SizePa is padded size of aggregator's deal
    uint64 sizePa;
}

IOnchainDataAggregator Interface

For building onchain aggregators (aggregators that compute CommPa within smart contract onchain) , the user interface is very similar. The only difference is the complete() callback function that aggregator calls: since the CommP aggregation happens within the onchain contract, PoDSI is entirely optional and thus the above fields (InclusionProof , InclusionVerifierData, and InclusionAuxData) are not required.

In this case, the function onchain_complete would only needs to include uint256 _id, uint64 _marketActorIduint64 _dealId, and uint64 _providerId fields, as shown below:

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.17;

/**
 * @title IOnchainDataAggregator Interface
 * @notice Interface for an aggregator oracle on the Filecoin Virtual Machine (FVM)
 */
interface IOnchainDataAggregator {
    
    /**
     * @notice A structure referencing a deal representation in the Filecoin network
     */
    struct Deal {
        uint64 dealId;   // A unique identifier for the deal.
        uint64 providerId;  // The storage provider that is storing the data for the deal.
	uint64 marketActorId; // The actor ID of the storage market in which the deal representation is stored. dealId is scoped by marketActorId
    }

    /**
     * @notice Emitted when a new request is submitted with an ID and content identifier (CID)
     * @param id Identifier for the request
     * @param cid Content identifier for the request
     */
    event SubmitAggregatorRequest(uint256 indexed id, bytes cid);

    /**
     * @notice Emitted when a new request is submitted with an ID, content identifier (CID), and RaaS parameters
     * @param id Identifier for the request
     * @param cid Content identifier for the request
     * @param _replication_target The number of copies this file should be replicated to
     * @param _repair_threshold The number of epochs the deal should be repaired after being inactive for
     * @param _renew_threshold The number of epochs the deal should be renewed before its expiration date
     */
    event SubmitAggregatorRequestWithRaaS(uint256 indexed id, bytes cid, uint256 _replication_target, uint256 _repair_threshold, uint256 _renew_threshold);

    /**
     * @notice Emitted when a request is completed, marking it with the request ID and deal ID
     * @param id Identifier for the request
     * @param dealId Identifier for the deal associated with the request
     */
    event CompleteAggregatorRequest(uint256 indexed id, uint64 indexed dealId);

    /**
     * @notice Function to submit a new file to the aggregator
     * @param _cid The piece CID of the file to be stored
     * @return The identifier for the submitted request
     */
    function submit(bytes memory _cid, bytes memory _fetchLink) external returns (uint256);

     /**
     * @notice Function to submit a new file to the aggregator, specifing the raas parameters
     * @param _cid The piece CID of the file to be stored
     * @param _replication_target The number of copies this file should be replicated to
     * @param _repair_threshold The number of epochs the deal should be repaired after being inactive for
     * @param _renew_threshold The number of epochs the deal should be renewed before its expiration date
     * @return 
     */
     function submitRaaS(bytes memory _cid, bytes memory _fetchLink, uint256 memory _replication_target, uint256 memory _repair_threshold, uint256 memory _renew_threshold);

    /**
     * @notice Callback function that is called by the onchain aggregator once data has been stored
     * @param _id The identifier for the request
     * @param _dealId The deal's unique identifier
     * @param _providerId The provider's unique identifier
     */
    function onchain_complete(
        uint256 _id,
	uint64 _marketActorId,
        uint64 _dealId,
        uint64 _providerId,
    ) external returns ();

}

IDataAggregatorEnumerable Interface

Outlined below is the IDataAggregatorEnumerable Interface, which can be optionally implemented by aggregators if they want to maintain a mapping of pieceCIDs and deals on chain. This makes the subpieces enumerable onchain and will allow third party observers of aggregators (chain indexers, block explorers and other SPs and clients interested in aggregation) to understand more in detail what data each aggregator is aggregating.

Note: on FVM mainnet, maintaining this mapping onchain for a high volume aggregator can become unscalable. Therefore, we recommend implementing this optional interface either on an IPC subnet or if the aggregator does not have a large volume of deals that are being aggregated.

interface IDataAggregatorEnumerable {
    /**
     * @notice Fetches an array of all piece CIDs processed by the aggregator
     * @param _startIndex The index at which to include the result in the response
     * @param _count The number of items to include in the response
     * @return Array of piece CIDs
     */
    function getAllCIDs(uint64 _startIndex, uint64 _count) external view returns (bytes[] memory);

    /**
     * @notice Retrieves all deal IDs associated with a specified piece CID
     * @param _cid The piece CID to query
     * @param _startIndex The index at which to include the result in the response
     * @param _count The number of items to include in the response
     * @return Array of deals corresponding to the piece CID
     */
    function getAllDeals(bytes memory _cid, uint64 _startIndex, uint64 _count) external view returns (Deal[] memory);
}

Reference Implementation

Reference implementation is provided here.

Rationale

The I*DataAggregator interfaces serve as a foundational blueprint for offchain aggregators in the Filecoin ecosystem. By prescribing a consistent set of methods and structures, we standardize the process of data submission across any aggregator that wants to create Filecoin storage deals.

Specifically:

getAllCIDs: Ensures third parties can transparently assess the content that has been aggregated. This can be used to access all the piece CIDs that have been written to chain from the current aggregator standard, which (in turn) can be used to get all deals from the aggregator.
getAllDeals: The getAllDeals function returns all the dealId and providerId for a given CID. Users can utilize this information to check deal status via FVM or Filecoin APIs and perform necessary actions like repair or renewal.
Two-step submit and complete process:
- Submission of Data Request:
  - Any user can initiate the process by using the submit() function to request the aggregation of specific data into a storage deal. The aggregator then actively monitors the event emitted by the smart contract, known as SubmitAggregatorRequest. Once this event occurs, the aggregator begins the process of aggregating the data, which is represented by the CID, into an onchain deal.
- Verification and Completion:
  - Once the data has been successfully aggregated into a deal, the aggregator proceeds to call the complete() function. This critical step serves to verify that the data has been correctly included within the aggregated bundle. If any issues are detected during this verification process, the complete() function reverts, ensuring that only deals that can be verified via PoDSI passes this step.
  - As a user, you can have confidence that your data has been included into the deal when you observe the occurrence of the CompleteAggregatorRequest event emitted by the complete() step associated with the corresponding dealId . This event confirms the successful completion of the data aggregation process.

Backwards Compatibility

This proposal introduces a new standard and does not alter or disrupt existing interfaces or implementations. However, offchain aggregators wishing to conform to this standard will need to adopt this interface.

Test Cases

Testing for the implementation of the DataAggregator interface should focus on:

Successful data submission and completion.
Correct PoDSI proof.
Accurate tracking of CIDs and deals’ metadata.
Proper event emissions during submission and completion of the aggregator request.

Data Retrieval

Aggregators maintain a copy of the data and serve it via an HTTP endpoint. Aggregators should provide an HTTP endpoint and a IPFS endpoint for users to retrieve the data for a certain period, e.g. 30 days. Client can make an HTTP call to the aggregator they uploaded the data to to download the file by providing the data’s CID (commPc).

Recommended API interface for the two retrieval methods mentioned above:

Retrieval endpoint for Filecoin retrieval: GET /piece/pieceCID
For detailed specification (http interface, header, responses etc.), please refer to this FRC regarding Filecoin piece retrieval gateway.
Retrieval gateway for IPFS retrieval: https://gateway/ipfs/CID

Security Considerations

Data to be aggregated is considered to be open. No security concerns at this time.

Copyright Waiver

Copyright and related rights waived via CC0.

aashidham · 2023-11-27T18:08:37Z

aashidham
Nov 27, 2023

Note that the text above has veered slightly from new since the latest draft: https://www.notion.so/Aggregator-Deal-Standard-FRC-Draft-v0-1-eaf46d911e1045ec93b817c49d6e6cf2

This link represents the latest thoughts on the matter.

lordshashank
Nov 28, 2023

As we would be using aggregators like lighthouse, they would be providing an api endpoint for deal aggregation. A demo raas implementation would be there in raas repo, builders can tweak the raas logic as per their use case but would need aggregator account as using their aggregation services.
There would be no need of payments on smart contract and raas side as aggregation and deal making would be handled by the aggregator so all the payments would be handled on their interface.
For retrievals, I don't think it would be issue on mainnet as I guess most of the miners do IPNI publishing in mainnet, and saturn would directly be able to retieve these files through their CID.
Also to use the aggregation api endpoint, user would have to pin his file to IPFS (or provide link) so that it is accessible to aggregator to make deal.
Their is no use of deal client contract in this flow.
One could also directly interact with his/aggregator's raas node without smartcontract flow. Only necessity of smart contract is to use the miner API provided by filecoin-solidity library to check if the deal made is active or not (these are view functions only), providing raas services as per the status. Other than that smart contract is just storing the deal info in its state which could be done in any other database as well.

0 replies

anorth · 2023-11-29T20:30:46Z

anorth
Nov 29, 2023
Maintainer

I appreciate the detail that is present in this proposal, but I think it's lacking some context about who it's for, when they would use it, how widely this is expected to be adopted, scale etc. Understanding the motivation and the specific problem it aims to solve would also be very helpful. As an FRC that doesn't touch the core protocol, I'm open to diverse approaches: let a thousand flowers bloom. However, for this to become be a widely adopted standard, I think there are some issues.

First let me sketch some context that I am assuming. Aggregation refers to the packing of multiple smaller pieces of data into a single on-chain "deal". This is useful because (a) many end users have small data, and (b) deals are expensive. The gas cost of publishing an on-chain deal for each smaller piece of data can be prohibitive. So an aggregator packs many small sub-pieces together into a single piece, provides its client/s with FRC-0058 PoDSI proofs, and transacts a single large deal with an SP. (End users are the aggregator's clients, but the aggregator is the SP's deal client).

I understand the aggregator interface proposal to be essentially a mechanism for storing sub-deal information (i.e. the deals that are aggregated into on-chain deals) on chain. At least temporarily, while the aggregation is happening, but some methods also imply permanently.

This raises some concerns. The essence of off-chain aggregation is to avoid the high costs of on-chain data storage. Storing sub-deal information on chain seems to remove the primary benefit of aggregation: that it's cheap by amortizing on-chain costs across many deals. The API appears to imply an index of Piece CID -> Deal on-chain, but I don't think this could reach any kind of large scale. Publishing per-deal information in the built-in market actor already consumes about half of the chain's bandwidth today. There is no room for growth and certainly no room to publish N>1 things per on-chain deal into chain state. Verifying inclusion proofs for all these items will be similarly prohibitive.

I would question the significant advantage of this mechanism over just publishing on-chain deals for the small data items. Ok, the built-in market actor is quite inefficient and we need a better one. But after FIP-0076 and the provision of on-chain commitment notifications to user actors, it could well be more efficient to make a full on-chain deal with a slim user market actor than have aggregation verified on-chain (the miner can batch-verify all the piece inclusions more efficiently than a smart contract doing them one at a time).

Even with more efficient on-chain market contracts, aggregation will remain relevant for cost amortization. We can expect continued technical progress on the core protocol and smart contracts to continue to drive deal costs down. Aggregation is based on the premise of an off-chain protocol seeking cost amortization over whatever the best possible on-chain deal is.

Finally, I'm curious about the necessity of a blockchain for this process? Blockchains are good for a few very specific properties, and just very inefficient if you don't need them. Submitting a URL to an aggregator and receiving back a proof of inclusion in an on-chain deal seems like it could be done more simply? Can this be a simple point-to-point protocol? Or use a single computer to broker the information? What's the motivation for doing something on chain? Perhaps we can engage in a productive design discussion on how to solve that problem? (Payment for the work would be a good reason, but that's not addressed in the proposal).

A few specific questions, which I think could be answered by expanding on motivation, context, expected use etc.

struct Deal { ... }

This doesn't seem to be referenced again throughout the spec. What's it for?

getAllCIDs: Ensures third parties can transparently assess the content that has been aggregated.

What third parties? Who wants or needs this? Why? Are they paying for it? Maintaining this on chain will be a great cost to users of the aggregator contract.

2 replies

aashidham Dec 1, 2023

A couple of high level thoughts on this (more to be added soon).

it's lacking some context about who it's for, when they would use it, how widely this is expected to be adopted, scale etc. Understanding the motivation and the specific problem it aims to solve would also be very helpful.

(A) This is a worthwhile callout and something we are planning on adding more detail with in the spec. Supporting documentation has this but not the spec directly.

The essence of off-chain aggregation is to avoid the high costs of on-chain data storage. Storing sub-deal information on chain seems to remove the primary benefit of aggregation: that it's cheap by amortizing on-chain costs across many deals. The API appears to imply an index of Piece CID -> Deal on-chain, but I don't think this could reach any kind of large scale.

(B) We now have a proposed mitigation for this: making the aggregators optionally enumerable. You are right that requiring this index for all aggregators deployed will not scale for the reasons you mention. However, for those parties that wish to enumerate subpieces onchain, they should be able to do so.

Note that only getAllDeals and getAllCIDs are the functions that require the contract currently to keep a deal / subpiece mapping. Therefore, we are considering removing these methods from the required interface family definition and instead defining something as follows:

interface IDataAggregatorEnumerable {
    function getAllCIDs(uint64 _startIndex, uint64 _count) external view returns (bytes[] memory);
    function getAllDeals(bytes memory _cid, uint64 _startIndex, uint64 _count) external view returns (Deal[] memory);
}

Note that an aggregator that implements IDataAggregatorEnumerable will then be able to enumerate this and keep track of this mapping onchain.

struct Deal doesn't seem to be referenced again throughout the spec

(C) This was a typo, thanks for flagging. Note that the Deal[] in the current version of the spec in notion.

What third parties? Who wants or needs this? Why?

(D) One key piece of this standard (which we will also enumerate on with A above) is the value of having this mapping onchain for those aggregators that want to maintain this. I agree it should not be a required part of the spec. However for those that want to implement this mapping, it is very valuable to multiple tools we want to build downstream of this FRC. This allows for us to create an aggregator explorer to see the active deals and subpieces tracked by every aggregator. It lets clients see which aggregators tend to aggregate which kinds of subpieces. If an aggregator goes offline, this contract can be used to allow another client to renew, repair and replicate existing subpieces without having to renew the entire deal.

honghaoq Dec 7, 2023
Author

Thanks for the inputs, FRC is updated above based on the feedback.

raulk · 2023-11-30T19:08:24Z

raulk
Nov 30, 2023

Some brief notes here based on a somewhat superficial look.

A standard like this makes transacting with an aggregator a trustless operation. Users can lock an amount in escrow to pay for an aggregation, and be convinced that the funds will only be released if the aggregator follows through. It also enables composing richer interactions with other on-chain entities, e.g. DataDAOs, who can now trustlessly fund an aggregator to store subpieces that are allowlisted/authorized by its members (e.g. all NFTs in a collection).
Agree with the scalability concerns, but they are emanating strictly from the enumeration methods (getAllCids, getAllDeals). If you move those out into an optional enumeration interface compliant with ERC-165, this standard can still deliver on the transactional promise, while it allows aggregators running on IPC subnets to store the subpiece index on-chain making it enumerable.
For aggregators on Filecoin L1 not wanting to implement the Enumerable interface, The Graph or explorers can help enumerate subpieces off-chain. I anyway doubt contracts will invoke those methods on-chain, so users should be well covered with external observability tools.

(I have not dug deep into the interface shape, types, naming choices, etc.)

0 replies

Aggregator Deal Standard #866

Uh oh!

Uh oh!

honghaoq Nov 15, 2023

Introduction

Motivation / Context

Specification

IOffchainDataAggregator Interface

IOnchainDataAggregator Interface

IDataAggregatorEnumerable Interface

Reference Implementation

Rationale

Backwards Compatibility

Test Cases

Data Retrieval

Security Considerations

Copyright Waiver

Replies: 4 comments · 2 replies

Uh oh!

Uh oh!

aashidham Nov 27, 2023

Uh oh!

lordshashank Nov 28, 2023

Uh oh!

anorth Nov 29, 2023 Maintainer

Uh oh!

Uh oh!

aashidham Dec 1, 2023

Uh oh!

honghaoq Dec 7, 2023 Author

Uh oh!

raulk Nov 30, 2023

honghaoq
Nov 15, 2023

Replies: 4 comments 2 replies

aashidham
Nov 27, 2023

lordshashank
Nov 28, 2023

anorth
Nov 29, 2023
Maintainer

honghaoq Dec 7, 2023
Author

raulk
Nov 30, 2023