Skip to content

Conversation

@metricaez
Copy link

@metricaez metricaez commented Oct 10, 2025

This PR implements a complete publish-subscribe mechanism for parachains, addressing issue #606.

Implemented:

  • M1: Publish instruction - Parachains can publish key-value data to the relay chain
  • M2: Subscribe instruction - Parachains can subscribe to specific publishers and receive only subscribed data
  • M3: Data change detection optimizations - Efficient updates via child trie root comparison

Review Focus

This is a design proposal and architectural review.

Please focus your attention on:

  • Component placement and architectural decisions
  • Data flow correctness and integration patterns
  • XCM instruction design and executor integration
  • Data structure design

Context: Issue #606

The original issue identified the challenge of expensive inter-parachain communication. Current methods (XCM messages, off-chain protocols) are complex and inefficient for broadcasting data across multiple parachains.

The proposed solution from the discussion:

  • Parachains can publish (key, value) data to the relay chain via XCM instruction
  • Data is stored in child tries per publisher (isolated storage)
  • Other parachains can access this data via their collators through validation data inherent.
  • Efficient change detection using child trie roots

This PR implements the core publishing mechanism as discussed, following the our best interpretation of XCM Publish instruction pattern suggested by @bkchr in the issue thread.

Architecture Overview

Publishing Flow

Parachain A (Publisher) via pallet-xcm
    ↓ XCM: Publish { data: [(key, value), ...] }
Relay Chain XCM Executor
    ↓ BroadcastHandler::handle_publish()
Broadcaster Pallet
    ↓ Stores in child trie for Para A
    ↓ Calculates child trie root
    ↓ Updates PublishedDataRoots storage
    ↓ Emits DataPublished event
Relay Chain State (Child Trie Storage)
    ↓ Collator fetches via ParachainHost API 
ParachainInherentData (published_data field)
    ↓ Passed to parachain runtime
Parachain B Runtime
    ↓ Compares roots with previous block
    ↓ Updates storage only if changed
    ✓ Data available for consumption

Subscribing Flow

Parachain B (Subscriber) via pallet-xcm
    ↓ XCM: Subscribe { publishers: [ParaId, ...] }
Relay Chain XCM Executor
    ↓ BroadcastHandler::handle_subscribe()
Broadcaster Pallet
    ↓ Stores subscription: Para B -> [Para A, Para C, ...]
    ↓ Emits Subscribed event
Relay Chain State (Subscription Tracking)
    ↓ Collator fetches via ParachainHost API v16
    ↓ get_subscribed_data(subscriber_para_id)
ParachainInherentData (published_data field)
    ↓ Contains ONLY data from subscribed publishers
Parachain B Runtime
    ✓ Filtered data available for consumption

Components Implemented

1. XCM v5 Publish & Subscribe Instructions

Location: polkadot/xcm/src/v5/mod.rs

Publish { data: PublishData }

Allows parachains to publish bounded key-value data to the relay chain.

  • Type: PublishData = BoundedVec<(BoundedVec<u8, 32>, BoundedVec<u8, 1024>), 16>
  • Current Limits: Max 16 items, 32-byte keys, 1024-byte values per operation. Arbitrary values for the sake of development.

Subscribe { publishers: BoundedVec<u32, 100> }

Allows parachains to subscribe to specific publisher parachains.

  • Type: BoundedVec<u32, 100> - List of ParaIds to subscribe to
  • Current Limit: Max 100 subscriptions per parachain. Arbitrary value for the sake of development.

Note: The instruction is temporarily added to XCM v5. Final placement (potentially XCM v6) should be discussed during the review process.

The instruction is intended to be called via pallet-xcm send with the proper execution buy instructions.

2. Broadcaster Pallet (Relay Chain)

Location: polkadot/runtime/parachains/src/broadcaster/

Core pallet managing published data on the relay chain.

Key features:

  • Child trie storage per publisher: Each parachain gets a deterministic child trie (ChildInfo::new_default(b"pubsub" + para_id.encode()))
  • Subscription tracking: Subscriptions storage maps subscriber ParaId => Vec<publisher ParaIds>
  • Key tracking: PublishedKeys storage tracks all keys published by each parachain for enumeration
  • Validation: Enforces limits on items, key/value lengths, and total stored keys

Storage:
PublisherExists: Tracks which parachains have published data
PublishedKeys: Tracks all keys per publisher (for enumeration)
Subscriptions: Tracks subscription relationships (subscriber => [publishers])
PublishedDataRoots: Aggregated child trie roots (exposed via well-known key for state proofs)

Traits:
PublishSubscribe: Used for exposing publish and subscribe operations for pallets to implement. Intended for pallet-broadcaster but provided a trait for possible future integrations.

Main function:
pub fn handle_publish(origin_para_id: ParaId, data: Vec<(Vec<u8>, Vec<u8>)>) -> DispatchResult
pub fn handle_subscribe(subscriber_para_id: ParaId, publishers: Vec<ParaId>) -> DispatchResult

3. BroadcastHandler Trait & Adapter

Location:

  • Trait: polkadot/xcm/xcm-executor/src/traits/broadcast_handler.rs
  • Adapter: polkadot/xcm/xcm-builder/src/broadcast_adapter.rs

BroadcastHandler trait:

pub trait BroadcastHandler {
    fn handle_publish(origin: &Location, data: PublishData) -> XcmResult;
    fn handle_subscribe(origin: &Location, publishers: BoundedVec<u32, 100>) -> XcmResult;
}

ParachainBroadcastAdapter:

  • Validates XCM origin
  • Extracts ParaId from XCM Location
  • Provides filtering
  • Bridges XCM executor to broadcaster pallet for both publish and subscribe operations

4. XCM Executor Integration

Location: polkadot/xcm/xcm-executor/src/lib.rs
The executor processes both Publish and Subscribe instructions by calling:

  • Config::BroadcastHandler::handle_publish()
  • Config::BroadcastHandler::handle_subscribe()

5. XCM Executor Config Trait Extension

Location: polkadot/xcm/xcm-executor/src/config.rs

Added BroadcastHandler to the executor's Config trait:

pub trait Config {
    // ... existing config items
    type BroadcastHandler: BroadcastHandler;
}

This requires all XCM executors to specify their broadcast handler implementation. Provided () implementation.

6. ParachainHost Runtime API

Location:
API definition: polkadot/primitives/src/runtime_api.rs
The ParachainHost runtime API was bumped to v16 with the addition of:

#[api_version(16)]
fn get_subscribed_data(subscriber_para_id: ParaId) -> BTreeMap<ParaId, Vec<(Vec<u8>, Vec<u8>)>>;

The get_subscribed_data method is the primary method used by collators to fetch filtered data for their parachain.

7. Well-Known Key for Data Roots

Location: polkadot/primitives/src/v9/mod.rs

Added well-known key for inclusion in relay chain state proofs:

pub const BROADCASTER_PUBLISHED_DATA_ROOTS: &[u8] = 
    &hex!["6aca18c1f7576767ccb238db4ccaedf239166324ac7ea24c870f96ab961f9654"];

This key is included in relay chain state proofs, allowing parachains to verify data roots and detect changes.

8. Relay Chain Interface Extension

Location: cumulus/client/relay-chain-interface/src/lib.rs

Extended the RelayChainInterface trait with:

async fn get_all_published_data(
    &self,
    at: PHash,
) -> RelayChainResult<BTreeMap<ParaId, Vec<(Vec<u8>, Vec<u8>)>>>

Implementations:

  • RelayChainInProcessInterface - Direct runtime API call
  • RelayChainRpcInterface - RPC client call to relay chain node

This interface is used by collators in cumulus/client/parachain-inherent/src/lib.rs to fetch published data when building inherent data.

9. ParachainInherentData Extension

Location: cumulus/primitives/parachain-inherent/src/lib.rs

Added published_data field to ParachainInherentData:

pub struct ParachainInherentData {
    pub validation_data: PersistedValidationData,
    pub relay_chain_state: sp_trie::StorageProof,
    pub downward_messages: Vec<InboundDownwardMessage>,
    pub horizontal_messages: BTreeMap<ParaId, Vec<InboundHrmpMessage>>,
    pub relay_parent_descendants: Vec<RelayHeader>,
    pub collator_peer_id: Option<ApprovedPeerId>,
    pub published_data: BTreeMap<ParaId, Vec<(Vec<u8>, Vec<u8>)>>,  // NEW
}

Design rationale: This follows the same pattern as existing message types (downward_messages, horizontal_messages). A direct field in the inherent data structure.

10. InboundPublishedData Wrapper

Location: cumulus/pallets/parachain-system/src/parachain_inherent.rs

Wrapper type for published data validation:

pub struct InboundPublishedData {
    pub data: BTreeMap<ParaId, Vec<(Vec<u8>, Vec<u8>)>>,
}

Purpose: Aligns with the SDK's pattern of wrapping inbound data types (InboundDownwardMessage, InboundHrmpMessage) for consistency and future extensibility.

11. Parachain-System Integration

Location: cumulus/pallets/parachain-system/src/lib.rs

Storage:

  • PublishedData: Double map storing received data (publisher ParaId, key) => value
  • PreviousPublishedDataRoots: Tracks previous block's data roots for change detection

Inherent creation (set_validation_data):

  • Receives published_data from collator via inherent
  • Validates data is included in inherent
  • Reads current data roots from relay chain state proof (via BROADCASTER_PUBLISHED_DATA_ROOTS)
  • Calls process_published_data() with root comparison

Change Detection Logic (process_published_data):

  • Compares current data roots with PreviousPublishedDataRoots
  • Only updates storage for publishers with changed roots (detected via hash comparison)
  • Clears data for publishers no longer publishing
  • Stores new roots for next block comparison
  • Optimization: Significantly reduces storage writes when data hasn't changed

11. Collator-Side Fetching Logic

Location: cumulus/client/parachain-inherent/src/lib.rs

Collators fetch published data when building inherent data:

let published_data = relay_chain_interface
    .get_subscribed_data(para_id, relay_parent)  // Uses subscriber-aware API
    .await
    .unwrap_or_default();

This data is then included in the ParachainInherentData passed to the parachain runtime.

12. Rococo Integration (Testing Purposes)

Location: polkadot/runtime/rococo/src/

A full demo integration of the Broadcaster pallet and Xcm related config can be found at:
https://github.com/blockdeep/polkadot-sdk/tree/feat/pubsub-root

Rationale: Enables reviewers to test the complete flow using Zombienet (config provided in pubsub-dev/zombienet.toml).

Testing

Relevant testing following SDK patterns have been provided to the newly added components.

It is recommended to run this test on the testing branch as it has both tooling and the Broadcaster pallet setup on Rococo. However it can be used as reference on any other integration.

Local Testing with Zombienet

A Zombienet configuration is provided in pubsub-dev/ for local testing:
cd pubsub-dev
./build.sh # Build polkadot and polkadot-parachain
zombienet spawn zombienet.toml

This spins up:

  • Rococo relay chain (4 validators)
  • Penpal parachain (2 collators)
    Note: The pubsub-dev/ directory is for review testing only and will be removed after the review process.

Extrinsics:

  • [Relay] Fund Parachain's Sovereign Account: 0x04030070617261e80300000000000000000000000000000000000000000000000000000b00407a10f35a
  • [Parachain] Subscribe to Parachain 1000 via pallet-xcm send call: 0x02003300050100050c000400000002286bee1300000002286bee0035e8030000
  • [Parachain] Publish some Data via pallet-xcm send call: 0x02003300050100050c000400000002286bee1300000002286bee003404143078313233143078313233

Known Behavior

  • No cleanup: Published data persists indefinitely on Broadcaster pallet.
  • No Sudo/Governance privileged calls: There is not admin origin call to cleanup or modify broadcaster storage or stored data in parachain-system.

Closure

Please share any concerns, suggestions, or alternative approaches. This is an early-stage proposal and we welcome all input to align with the SDK's architecture and design principles.

Related: #606

@metricaez metricaez changed the title [DRAFT] Publish-only pubsub mechanism (Milestone 1) - Addresses #606 [DRAFT] PubSub mechanism (Milestone 1 & 2) - Addresses #606 Oct 31, 2025
@metricaez metricaez changed the title [DRAFT] PubSub mechanism (Milestone 1 & 2) - Addresses #606 PubSub mechanism- Addresses #606 Nov 12, 2025
@metricaez metricaez changed the title PubSub mechanism- Addresses #606 PubSub Mechanism - Addresses #606 Nov 12, 2025
@metricaez metricaez marked this pull request as ready for review November 13, 2025 14:20
@metricaez metricaez requested a review from a team as a code owner November 13, 2025 14:20
Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several issues with the pull request:

  1. You introduce a Subscribe message. This was never part of the issue and also makes no real sense. Subscription should be handled totally local on the parachain, no one outside the parachain (besides the collator) needs to know this.
  2. We should probably store the roots of the published parachain data in a map. So, we only need to include the roots in the storage proof we are interested in.
  3. I probably did not make this explicit enough, but right now you are forcing every parachain to include all the roots from all the parachains and you also have no way to find out which parachains you are interested in. There should be a runtime api introduced that KeyToIncludeInRelayProof that returns a list of keys. These keys would be for example the key to the root of the published data of one parachain and then the keys into the child trie for exactly the data we are interested in. Then on the collator side we can fetch these keys from the runtime and do not need the retrieve_subscribed_published_data method.
  4. There is no way to subscribe to data in the runtime right now, especially no kind of hook that will inform you when this data changed.

Comment on lines +1734 to +1736
let mut p = 0u32;
let mut k = 0u32;
let mut v = 0u32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please more expressive names.

pub MaxPalletsInfo: u32 = 64;
pub MaxAssetTransferFilters: u32 = 6;
pub MaxPublishItems: u32 = 16;
pub MaxPublishKeyLength: u32 = 32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key should be directly a Hash and not generic.

Total size should be probably something like 2KiB, aka not that much crazy data.

mod tests;

#[frame_support::pallet]
pub mod pallet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no logic that cleans up the data after a parachain got off-boarded.

We probably also request that parachains first need to register themselves for publishing data (if they are not a system chain) and require a deposit for the ~2KiB of data.

@metricaez
Copy link
Author

metricaez commented Dec 4, 2025

3. I probably did not make this explicit enough, but right now you are forcing every parachain to include all the roots from all the parachains and you also have no way to find out which parachains you are interested in. There should be a runtime api introduced that KeyToIncludeInRelayProof that returns a list of keys. These keys would be for example the key to the root of the published data of one parachain and then the keys into the child trie for exactly the data we are interested in. Then on the collator side we can fetch these keys from the runtime and do not need the retrieve_subscribed_published_data method.

hey @bkchr, thanks for the comments and review

I have already addressed moving the subscriptions to the parachain, exposed them to the collator via an API and storing the roots in a storage map instead of a vec, however I would like to share with you the following:

I’m exploring the path of not only adding the keys of the published data to the storage proof, but also including the data itself inside the proof so that parachain-system can extract it later. With this approach, no published_data field would be needed inside ParachainInherentData since the data would come directly inside the proof. The main issue is that the relay-chain interface does not expose child-trie data, only the main trie via prove_read, so this gets tricky.

I have done some progress but before moving deeper into that direction which would require adding some code to the relay chain interface, I would like to clarify one thing on the feedback provided, because I don’t understand whether you expect the whole relay API to be removed, or whether a relay API that only serves the relevant child-trie data would be acceptable. This is what I mean and how I would implement it:

The subscriptions are handled by parachain-system in the parachain runtime, passed to the collator via an API, and only the subscribed roots are added to the proof. But the collator (now that it knows the para ID the parachain is subscribed to, and the keys for data it is subscribed to) queries a relay-chain API that exposes retrieve_published_data_by_key (name TBC) that returns only the data for the relevant child trie and key, and then adds it to the published_data field of ParachainInherentData. This would be similar to horizontal messages, which have their own field in ParachainInherentData where the message content is stored, while some data such as
hrmp_channels(HrmpChannelId { sender, recipient: para_id })
still comes via the proof.

This second approach would still satisfy the requirement to move subscription tracking to the parachain, avoid adding all roots to the proof and retrieve_subscribed_published_data would no longer be needed, only a method exposed via an API that given a paraId and a key, returns the underlying data.

I would really appreciate your input on this topic as I might be missing or misunderstanding something, thanks!

@bkchr
Copy link
Member

bkchr commented Dec 5, 2025

The subscriptions are handled by parachain-system in the parachain runtime, passed to the collator via an API, and only the subscribed roots are added to the proof. But the collator (now that it knows the para ID the parachain is subscribed to, and the keys for data it is subscribed to) queries a relay-chain API that exposes retrieve_published_data_by_key (name TBC) that returns only the data for the relevant child trie and key, and then adds it to the published_data field of ParachainInherentData. This would be similar to horizontal messages, which have their own field in ParachainInherentData where the message content is stored, while some data such as
hrmp_channels(HrmpChannelId { sender, recipient: para_id })
still comes via the proof.

The problem is that this doesn't work for use case of the published data. Because you can not proof that the data inside published_data belongs to the storage root you are fetching out of the relay chain storage proof. In case of the messages, we have a message chain that needs to fit.

So, you just put everything in to the relay chain storage proof. Also we are maybe just interested in one of the published fields, which works better with the storage proof.

@metricaez
Copy link
Author

Closing this PR now, re design based on feedback presented on #10679

@metricaez metricaez closed this Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants