PubSub Mechanism - Addresses #606 #9994

metricaez · 2025-10-10T13:54:00Z

This PR implements a complete publish-subscribe mechanism for parachains, addressing issue #606.

Implemented:

M1: Publish instruction - Parachains can publish key-value data to the relay chain
M2: Subscribe instruction - Parachains can subscribe to specific publishers and receive only subscribed data
M3: Data change detection optimizations - Efficient updates via child trie root comparison

Review Focus

This is a design proposal and architectural review.

Please focus your attention on:

Component placement and architectural decisions
Data flow correctness and integration patterns
XCM instruction design and executor integration
Data structure design

Context: Issue #606

The original issue identified the challenge of expensive inter-parachain communication. Current methods (XCM messages, off-chain protocols) are complex and inefficient for broadcasting data across multiple parachains.

The proposed solution from the discussion:

Parachains can publish (key, value) data to the relay chain via XCM instruction
Data is stored in child tries per publisher (isolated storage)
Other parachains can access this data via their collators through validation data inherent.
Efficient change detection using child trie roots

This PR implements the core publishing mechanism as discussed, following the our best interpretation of XCM Publish instruction pattern suggested by @bkchr in the issue thread.

Architecture Overview

Publishing Flow

Parachain A (Publisher) via pallet-xcm
    ↓ XCM: Publish { data: [(key, value), ...] }
Relay Chain XCM Executor
    ↓ BroadcastHandler::handle_publish()
Broadcaster Pallet
    ↓ Stores in child trie for Para A
    ↓ Calculates child trie root
    ↓ Updates PublishedDataRoots storage
    ↓ Emits DataPublished event
Relay Chain State (Child Trie Storage)
    ↓ Collator fetches via ParachainHost API 
ParachainInherentData (published_data field)
    ↓ Passed to parachain runtime
Parachain B Runtime
    ↓ Compares roots with previous block
    ↓ Updates storage only if changed
    ✓ Data available for consumption

Subscribing Flow

Parachain B (Subscriber) via pallet-xcm
    ↓ XCM: Subscribe { publishers: [ParaId, ...] }
Relay Chain XCM Executor
    ↓ BroadcastHandler::handle_subscribe()
Broadcaster Pallet
    ↓ Stores subscription: Para B -> [Para A, Para C, ...]
    ↓ Emits Subscribed event
Relay Chain State (Subscription Tracking)
    ↓ Collator fetches via ParachainHost API v16
    ↓ get_subscribed_data(subscriber_para_id)
ParachainInherentData (published_data field)
    ↓ Contains ONLY data from subscribed publishers
Parachain B Runtime
    ✓ Filtered data available for consumption

Components Implemented

1. XCM v5 Publish & Subscribe Instructions

Location: polkadot/xcm/src/v5/mod.rs

Publish { data: PublishData }

Allows parachains to publish bounded key-value data to the relay chain.

Type: PublishData = BoundedVec<(BoundedVec<u8, 32>, BoundedVec<u8, 1024>), 16>
Current Limits: Max 16 items, 32-byte keys, 1024-byte values per operation. Arbitrary values for the sake of development.

Subscribe { publishers: BoundedVec<u32, 100> }

Allows parachains to subscribe to specific publisher parachains.

Type: BoundedVec<u32, 100> - List of ParaIds to subscribe to
Current Limit: Max 100 subscriptions per parachain. Arbitrary value for the sake of development.

Note: The instruction is temporarily added to XCM v5. Final placement (potentially XCM v6) should be discussed during the review process.

The instruction is intended to be called via pallet-xcm send with the proper execution buy instructions.

2. Broadcaster Pallet (Relay Chain)

Location: polkadot/runtime/parachains/src/broadcaster/

Core pallet managing published data on the relay chain.

Key features:

Child trie storage per publisher: Each parachain gets a deterministic child trie (ChildInfo::new_default(b"pubsub" + para_id.encode()))
Subscription tracking: Subscriptions storage maps subscriber ParaId => Vec<publisher ParaIds>
Key tracking: PublishedKeys storage tracks all keys published by each parachain for enumeration
Validation: Enforces limits on items, key/value lengths, and total stored keys

Storage:
PublisherExists: Tracks which parachains have published data
PublishedKeys: Tracks all keys per publisher (for enumeration)
Subscriptions: Tracks subscription relationships (subscriber => [publishers])
PublishedDataRoots: Aggregated child trie roots (exposed via well-known key for state proofs)

Traits:
PublishSubscribe: Used for exposing publish and subscribe operations for pallets to implement. Intended for pallet-broadcaster but provided a trait for possible future integrations.

Main function:
pub fn handle_publish(origin_para_id: ParaId, data: Vec<(Vec<u8>, Vec<u8>)>) -> DispatchResult
pub fn handle_subscribe(subscriber_para_id: ParaId, publishers: Vec<ParaId>) -> DispatchResult

3. BroadcastHandler Trait & Adapter

Location:

Trait: polkadot/xcm/xcm-executor/src/traits/broadcast_handler.rs
Adapter: polkadot/xcm/xcm-builder/src/broadcast_adapter.rs

BroadcastHandler trait:

pub trait BroadcastHandler {
    fn handle_publish(origin: &Location, data: PublishData) -> XcmResult;
    fn handle_subscribe(origin: &Location, publishers: BoundedVec<u32, 100>) -> XcmResult;
}

ParachainBroadcastAdapter:

Validates XCM origin
Extracts ParaId from XCM Location
Provides filtering
Bridges XCM executor to broadcaster pallet for both publish and subscribe operations

4. XCM Executor Integration

Location: polkadot/xcm/xcm-executor/src/lib.rs
The executor processes both Publish and Subscribe instructions by calling:

Config::BroadcastHandler::handle_publish()
Config::BroadcastHandler::handle_subscribe()

5. XCM Executor Config Trait Extension

Location: polkadot/xcm/xcm-executor/src/config.rs

Added BroadcastHandler to the executor's Config trait:

pub trait Config {
    // ... existing config items
    type BroadcastHandler: BroadcastHandler;
}

This requires all XCM executors to specify their broadcast handler implementation. Provided () implementation.

6. ParachainHost Runtime API

Location:
API definition: polkadot/primitives/src/runtime_api.rs
The ParachainHost runtime API was bumped to v16 with the addition of:

#[api_version(16)]
fn get_subscribed_data(subscriber_para_id: ParaId) -> BTreeMap<ParaId, Vec<(Vec<u8>, Vec<u8>)>>;

The get_subscribed_data method is the primary method used by collators to fetch filtered data for their parachain.

7. Well-Known Key for Data Roots

Location: polkadot/primitives/src/v9/mod.rs

Added well-known key for inclusion in relay chain state proofs:

pub const BROADCASTER_PUBLISHED_DATA_ROOTS: &[u8] = 
    &hex!["6aca18c1f7576767ccb238db4ccaedf239166324ac7ea24c870f96ab961f9654"];

This key is included in relay chain state proofs, allowing parachains to verify data roots and detect changes.

8. Relay Chain Interface Extension

Location: cumulus/client/relay-chain-interface/src/lib.rs

Extended the RelayChainInterface trait with:

async fn get_all_published_data(
    &self,
    at: PHash,
) -> RelayChainResult<BTreeMap<ParaId, Vec<(Vec<u8>, Vec<u8>)>>>

Implementations:

RelayChainInProcessInterface - Direct runtime API call
RelayChainRpcInterface - RPC client call to relay chain node

This interface is used by collators in cumulus/client/parachain-inherent/src/lib.rs to fetch published data when building inherent data.

9. ParachainInherentData Extension

Location: cumulus/primitives/parachain-inherent/src/lib.rs

Added published_data field to ParachainInherentData:

pub struct ParachainInherentData {
    pub validation_data: PersistedValidationData,
    pub relay_chain_state: sp_trie::StorageProof,
    pub downward_messages: Vec<InboundDownwardMessage>,
    pub horizontal_messages: BTreeMap<ParaId, Vec<InboundHrmpMessage>>,
    pub relay_parent_descendants: Vec<RelayHeader>,
    pub collator_peer_id: Option<ApprovedPeerId>,
    pub published_data: BTreeMap<ParaId, Vec<(Vec<u8>, Vec<u8>)>>,  // NEW
}

Design rationale: This follows the same pattern as existing message types (downward_messages, horizontal_messages). A direct field in the inherent data structure.

10. InboundPublishedData Wrapper

Location: cumulus/pallets/parachain-system/src/parachain_inherent.rs

Wrapper type for published data validation:

pub struct InboundPublishedData {
    pub data: BTreeMap<ParaId, Vec<(Vec<u8>, Vec<u8>)>>,
}

Purpose: Aligns with the SDK's pattern of wrapping inbound data types (InboundDownwardMessage, InboundHrmpMessage) for consistency and future extensibility.

11. Parachain-System Integration

Location: cumulus/pallets/parachain-system/src/lib.rs

Storage:

PublishedData: Double map storing received data (publisher ParaId, key) => value
PreviousPublishedDataRoots: Tracks previous block's data roots for change detection

Inherent creation (set_validation_data):

Receives published_data from collator via inherent
Validates data is included in inherent
Reads current data roots from relay chain state proof (via BROADCASTER_PUBLISHED_DATA_ROOTS)
Calls process_published_data() with root comparison

Change Detection Logic (process_published_data):

Compares current data roots with PreviousPublishedDataRoots
Only updates storage for publishers with changed roots (detected via hash comparison)
Clears data for publishers no longer publishing
Stores new roots for next block comparison
Optimization: Significantly reduces storage writes when data hasn't changed

11. Collator-Side Fetching Logic

Location: cumulus/client/parachain-inherent/src/lib.rs

Collators fetch published data when building inherent data:

let published_data = relay_chain_interface
    .get_subscribed_data(para_id, relay_parent)  // Uses subscriber-aware API
    .await
    .unwrap_or_default();

This data is then included in the ParachainInherentData passed to the parachain runtime.

12. Rococo Integration (Testing Purposes)

Location: polkadot/runtime/rococo/src/

A full demo integration of the Broadcaster pallet and Xcm related config can be found at:
https://github.com/blockdeep/polkadot-sdk/tree/feat/pubsub-root

Rationale: Enables reviewers to test the complete flow using Zombienet (config provided in pubsub-dev/zombienet.toml).

Testing

Relevant testing following SDK patterns have been provided to the newly added components.

It is recommended to run this test on the testing branch as it has both tooling and the Broadcaster pallet setup on Rococo. However it can be used as reference on any other integration.

Local Testing with Zombienet

A Zombienet configuration is provided in pubsub-dev/ for local testing:
cd pubsub-dev
./build.sh # Build polkadot and polkadot-parachain
zombienet spawn zombienet.toml

This spins up:

Rococo relay chain (4 validators)
Penpal parachain (2 collators)
Note: The pubsub-dev/ directory is for review testing only and will be removed after the review process.

Extrinsics:

[Relay] Fund Parachain's Sovereign Account: 0x04030070617261e80300000000000000000000000000000000000000000000000000000b00407a10f35a
[Parachain] Subscribe to Parachain 1000 via pallet-xcm send call: 0x02003300050100050c000400000002286bee1300000002286bee0035e8030000
[Parachain] Publish some Data via pallet-xcm send call: 0x02003300050100050c000400000002286bee1300000002286bee003404143078313233143078313233

Known Behavior

No cleanup: Published data persists indefinitely on Broadcaster pallet.
No Sudo/Governance privileged calls: There is not admin origin call to cleanup or modify broadcaster storage or stored data in parachain-system.

Closure

Please share any concerns, suggestions, or alternative approaches. This is an early-stage proposal and we welcome all input to align with the SDK's architecture and design principles.

Related: #606

bkchr

There are several issues with the pull request:

You introduce a Subscribe message. This was never part of the issue and also makes no real sense. Subscription should be handled totally local on the parachain, no one outside the parachain (besides the collator) needs to know this.
We should probably store the roots of the published parachain data in a map. So, we only need to include the roots in the storage proof we are interested in.
I probably did not make this explicit enough, but right now you are forcing every parachain to include all the roots from all the parachains and you also have no way to find out which parachains you are interested in. There should be a runtime api introduced that KeyToIncludeInRelayProof that returns a list of keys. These keys would be for example the key to the root of the published data of one parachain and then the keys into the child trie for exactly the data we are interested in. Then on the collator side we can fetch these keys from the runtime and do not need the retrieve_subscribed_published_data method.
There is no way to subscribe to data in the runtime right now, especially no kind of hook that will inform you when this data changed.

bkchr · 2025-12-02T10:15:09Z

cumulus/pallets/parachain-system/src/lib.rs

+		let mut p = 0u32;
+		let mut k = 0u32;
+		let mut v = 0u32;


Please more expressive names.

bkchr · 2025-12-02T10:31:23Z

polkadot/xcm/src/v5/mod.rs

 	pub MaxPalletsInfo: u32 = 64;
 	pub MaxAssetTransferFilters: u32 = 6;
+	pub MaxPublishItems: u32 = 16;
+	pub MaxPublishKeyLength: u32 = 32;


Key should be directly a Hash and not generic.

Total size should be probably something like 2KiB, aka not that much crazy data.

bkchr · 2025-12-02T10:33:31Z

polkadot/runtime/parachains/src/broadcaster/mod.rs

+mod tests;
+
+#[frame_support::pallet]
+pub mod pallet {


There is no logic that cleans up the data after a parachain got off-boarded.

We probably also request that parachains first need to register themselves for publishing data (if they are not a system chain) and require a deposit for the ~2KiB of data.

metricaez · 2025-12-04T21:11:12Z

3. I probably did not make this explicit enough, but right now you are forcing every parachain to include all the roots from all the parachains and you also have no way to find out which parachains you are interested in. There should be a runtime api introduced that KeyToIncludeInRelayProof that returns a list of keys. These keys would be for example the key to the root of the published data of one parachain and then the keys into the child trie for exactly the data we are interested in. Then on the collator side we can fetch these keys from the runtime and do not need the retrieve_subscribed_published_data method.

hey @bkchr, thanks for the comments and review

I have already addressed moving the subscriptions to the parachain, exposed them to the collator via an API and storing the roots in a storage map instead of a vec, however I would like to share with you the following:

I’m exploring the path of not only adding the keys of the published data to the storage proof, but also including the data itself inside the proof so that parachain-system can extract it later. With this approach, no published_data field would be needed inside ParachainInherentData since the data would come directly inside the proof. The main issue is that the relay-chain interface does not expose child-trie data, only the main trie via prove_read, so this gets tricky.

I have done some progress but before moving deeper into that direction which would require adding some code to the relay chain interface, I would like to clarify one thing on the feedback provided, because I don’t understand whether you expect the whole relay API to be removed, or whether a relay API that only serves the relevant child-trie data would be acceptable. This is what I mean and how I would implement it:

The subscriptions are handled by parachain-system in the parachain runtime, passed to the collator via an API, and only the subscribed roots are added to the proof. But the collator (now that it knows the para ID the parachain is subscribed to, and the keys for data it is subscribed to) queries a relay-chain API that exposes retrieve_published_data_by_key (name TBC) that returns only the data for the relevant child trie and key, and then adds it to the published_data field of ParachainInherentData. This would be similar to horizontal messages, which have their own field in ParachainInherentData where the message content is stored, while some data such as
hrmp_channels(HrmpChannelId { sender, recipient: para_id })
still comes via the proof.

This second approach would still satisfy the requirement to move subscription tracking to the parachain, avoid adding all roots to the proof and retrieve_subscribed_published_data would no longer be needed, only a method exposed via an API that given a paraId and a key, returns the underlying data.

I would really appreciate your input on this topic as I might be missing or misunderstanding something, thanks!

bkchr · 2025-12-05T12:25:42Z

The subscriptions are handled by parachain-system in the parachain runtime, passed to the collator via an API, and only the subscribed roots are added to the proof. But the collator (now that it knows the para ID the parachain is subscribed to, and the keys for data it is subscribed to) queries a relay-chain API that exposes retrieve_published_data_by_key (name TBC) that returns only the data for the relevant child trie and key, and then adds it to the published_data field of ParachainInherentData. This would be similar to horizontal messages, which have their own field in ParachainInherentData where the message content is stored, while some data such as
hrmp_channels(HrmpChannelId { sender, recipient: para_id })
still comes via the proof.

The problem is that this doesn't work for use case of the published data. Because you can not proof that the data inside published_data belongs to the storage root you are fetching out of the relay chain storage proof. In case of the messages, we have a message chain that needs to fit.

So, you just put everything in to the relay chain storage proof. Also we are maybe just interested in one of the published fields, which works better with the storage proof.

metricaez · 2025-12-17T17:30:38Z

Closing this PR now, re design based on feedback presented on #10679

metricaez force-pushed the feat/pubsub branch from 69f4973 to a2544f8 Compare October 15, 2025 18:51

metricaez force-pushed the feat/pubsub branch from a2544f8 to 503080b Compare October 31, 2025 13:21

metricaez changed the title ~~[DRAFT] Publish-only pubsub mechanism (Milestone 1) - Addresses #606~~ [DRAFT] PubSub mechanism (Milestone 1 & 2) - Addresses #606 Oct 31, 2025

metricaez force-pushed the feat/pubsub branch from cc150d3 to 8289eda Compare November 11, 2025 17:10

metricaez changed the title ~~[DRAFT] PubSub mechanism (Milestone 1 & 2) - Addresses #606~~ PubSub mechanism- Addresses #606 Nov 12, 2025

metricaez changed the title ~~PubSub mechanism- Addresses #606~~ PubSub Mechanism - Addresses #606 Nov 12, 2025

metricaez marked this pull request as ready for review November 13, 2025 14:20

metricaez requested a review from a team as a code owner November 13, 2025 14:20

metricaez force-pushed the feat/pubsub branch from 886d52f to 9b70059 Compare November 25, 2025 18:07

metricaez added 21 commits December 1, 2025 08:31

feat: Publish instruction enum def

a0f4c34

feat: BroadcastHandler

257d9c5

fix: a bunch of boilerplate BroadcastHandlers

2e2ff2e

feat: boilerplate handler for executor mock

83bea50

feat: pallet-broadcaster

22efdff

feat: some tests

76898b5

feat: add published data to ParachainInherentData

9e3ddde

feat: expose published data via ParachainHost

abdea03

feat: broadcast adapter and rococo

47e551a

feat: add missing tests

97dfeda

feat: delte deprecated event trait

9278c7c

fix: api v 16

e72dbce

feat: broadcast adapter and rococo

de54575

fix: unnecesary config trait

e846236

feat: Subscribe instruction enum def

36b7343

feat: subscribe executor integration start impl

fbfd1ee

feat: toggle subscribe logic and runtime api

5579328

fix: missing boilerplate weights

3d25fb0

fix: missing rococo broadcaster config params

2c46c78

feat: inherent processing and get data by subscription on interface

65a9fca

feat: tests and benchmark

8f73a76

metricaez added 8 commits December 1, 2025 08:32

feat: remove unusude api and deprecated trait

4b878e4

feat: add missing test and clean unused trait

7d68391

choir: delete unnecesary api

2ea8775

feat: data roots population

008dd41

feat: root changes detect and benchmarks

9b5a1e5

feat: fix some tests

36542f2

feat: missing test and mocks trait

3fcea1e

choir: cleanup rococo and auxiliary scripts

68edfac

metricaez force-pushed the feat/pubsub branch from 9b70059 to 68edfac Compare December 1, 2025 11:32

bkchr requested changes Dec 2, 2025

View reviewed changes

This was referenced Dec 17, 2025

PubSub Mechanism - Follow up on #9994 #10677

Closed

Feat: Add API and mechanism to retrieve additional top-level and child proofs via the relay state proof #10678

Open

PubSub Mechanism - Follow up on #9994 #10679

Open

metricaez closed this Dec 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PubSub Mechanism - Addresses #606 #9994

PubSub Mechanism - Addresses #606 #9994

Uh oh!

metricaez commented Oct 10, 2025 •

edited

Loading

Uh oh!

bkchr left a comment

Uh oh!

bkchr Dec 2, 2025

Uh oh!

bkchr Dec 2, 2025

Uh oh!

bkchr Dec 2, 2025

Uh oh!

metricaez commented Dec 4, 2025 •

edited

Loading

Uh oh!

bkchr commented Dec 5, 2025

Uh oh!

metricaez commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PubSub Mechanism - Addresses #606 #9994

PubSub Mechanism - Addresses #606 #9994

Uh oh!

Conversation

metricaez commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review Focus

Context: Issue #606

Architecture Overview

Publishing Flow

Subscribing Flow

Components Implemented

1. XCM v5 Publish & Subscribe Instructions

2. Broadcaster Pallet (Relay Chain)

3. BroadcastHandler Trait & Adapter

4. XCM Executor Integration

5. XCM Executor Config Trait Extension

6. ParachainHost Runtime API

7. Well-Known Key for Data Roots

8. Relay Chain Interface Extension

9. ParachainInherentData Extension

10. InboundPublishedData Wrapper

11. Parachain-System Integration

11. Collator-Side Fetching Logic

12. Rococo Integration (Testing Purposes)

Testing

Local Testing with Zombienet

Known Behavior

Closure

Uh oh!

bkchr left a comment

Choose a reason for hiding this comment

Uh oh!

bkchr Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

bkchr Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

bkchr Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

metricaez commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkchr commented Dec 5, 2025

Uh oh!

metricaez commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

metricaez commented Oct 10, 2025 •

edited

Loading

metricaez commented Dec 4, 2025 •

edited

Loading