feat: cache operator state #1696

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Draft

cody-littley wants to merge 13 commits into master from cache-onchain-state-dispatcher

+830 −509

Contributor

cody-littley commented Jun 30, 2025 •

edited

Loading

Why are these changes needed?

Cache batch metadata, instead of fetching it each time a new batch is created.

dmanc and others added 4 commits

April 23, 2025 22:38


          Prefetch on chain state on dispatcher

f932147


          Log fetch for operator state

2a2c98b


          apply some pr feedback

126e4f0


          Add reload cache and fix unit tests

78150e8

cody-littley assigned yujiezhu0

cody-littley commented

View reviewed changes

Contributor Author

cody-littley left a comment

Sam recently updated the golang version in the repo to 1.24. If you haven't done so yet, I suggest merging the latest master into your branch.

disperser/controller/dispatcher.go Outdated

Comment on lines 174 to 188

+              		go func() {
+              			ticker := time.NewTicker(d.OnchainStateRefreshInterval)
+              			defer ticker.Stop()
+              			for {
+              				select {
+              				case <-ctx.Done():
+              					return
+              				case <-ticker.C:
+              					if err := d.refreshOnchainState(ctx); err != nil {
+              						d.logger.Error("failed to refresh onchain state", "err", err)
+              					}
+              				}
+              			}
+              		}()

Contributor Author

cody-littley Jun 30, 2025

I suggest moving this to its own function, e.g. func (d *Dispatcher) stateFetcher(), then calling go d.stateFetcher() in the Start() method.

disperser/controller/dispatcher.go Outdated

Comment on lines 128 to 130

+              	defer func() {
+              		d.metrics.reportGetOperatorStateLatency(time.Since(start))
+              	}()

Contributor Author

cody-littley Jun 30, 2025

Suggested change

      
            	defer func() {
          
            		d.metrics.reportGetOperatorStateLatency(time.Since(start))
          
            	}()
          
            	defer d.metrics.reportGetOperatorStateLatency(time.Since(start))

disperser/controller/dispatcher.go Outdated

Comment on lines 235 to 243

+              	if time.Since(cachedOnChainState.LastRefreshed) > d.OnchainStateRefreshInterval {
+              		d.logger.Warn("cached onchain state is outdated, manually fetching it")
+              		err := d.refreshOnchainState(ctx)
+              		if err != nil {
+              			return nil, nil, fmt.Errorf("failed to refresh onchain state: %w", err)
+              		}
+              		// reload the fresh snapshot
+              		cachedOnChainState = d.cachedOnchainState.Load().(*CachedOnchainState)
               	}

Contributor Author

cody-littley Jun 30, 2025

The way this is now, there may be a race where the background thread and this thread always go and attempt to fetch the data at approximately the same time.

Perhaps there could be two configuration values:

the OnchainStateRefreshInterval, which is used by the background goroutine
a new setting BatchMetadataMaxAge, which is used by this block of code. By default, it should be larger than OnchainStateRefreshInterval (perhaps 2x or 5x as long).

It's ok to use slightly out of date chain state for new batches. Using chain state from five or ten minutes ago doesn't really hurt anything.

The end result is that this block will not trigger unless the background thread is unable to fetch the chain state for a long time. If that is happening, I'd also expect for this thread to fail at its task, and we'd start failing to create new batches. Creating no new batches is probably the correct behavior if we cease being able to get recent operator state for an extremely long period of time.

Contributor Author

cody-littley Jun 30, 2025

I also suggest the following special behavior for these settings:

if OnchainStateRefreshInterval is 0, then this code should not attempt to fetch any information on a background goroutine. In this case, it will fall back to fetching the chain state on the batch creation thread.
If BatchMetadataMaxAge (i.e. the new setting I suggest above) is 0, then it should always fetch the onchain state every time it creates a new batch.

The advantage of doing this is that we can fall back to the old pattern if we detect issues with the new pattern in the future. Additionally, if there are legacy tests that break with the new threading pattern, you can configure the code to pass with those legacy tests.

disperser/controller/dispatcher.go Outdated

               	// Get a batch of blobs to dispatch
               	// This also writes a batch header and blob inclusion info for each blob in metadata store
-              	batchData, err := d.NewBatch(ctx, referenceBlockNumber)
+              	batchData, err := d.NewBatch(ctx, cachedOnChainState)

Contributor Author

cody-littley Jun 30, 2025

cachedOnChainState is only used by the NewBatch() method. Would it be possible to fetch the chain state inside NewBatch() instead of passing it in? Perhaps you could even encapsulate the block of code that fetches cachedOnChainState into a helper method.

HandleBatch() is already a very long and complicated method, and so extracting code helps to reduce its complexity.

disperser/controller/dispatcher.go Outdated

+              	// Store updated cache atomically
+              	d.cachedOnchainState.Store(cached)
+              	d.logger.Debug("refreshed onchain state", "blockNumber", currentBlockNumber, "referenceBlockNumber", referenceBlockNumber, "quorumIDs", quorumIDs)

Contributor Author

cody-littley Jun 30, 2025

Nit, this line is longer than 120 characters, could you wrap it?

yujiezhu0 added 9 commits

June 30, 2025 12:47


          Merge from master

bca67eb


          Address comments and fix unit tests

41d0ff8


          Remove commented code

f966add


          Add BatchMetadataMaxAgeFlag

216fedf


          Add back TestDispatcherInsufficientSignatures2

f10e2fc


          Change initial cache store time to 0

b3a513b


          Allow initial cache to be nil

8bad0ed


          Add retry for GetIndexedOperatorState

bdaf658


          Remove refresh on start

3388a51

yujiezhu0 marked this pull request as ready for review

July 7, 2025 22:15

cody-littley commented

View reviewed changes

Contributor Author

cody-littley left a comment

LGTM! I can't approve this change since I technically opened the PR, so you will probably have to press "approve" on it, or find somebody else to also do a review.

cody-littley requested a review from litt3

July 8, 2025 17:55

litt3 marked this pull request as draft

July 24, 2025 14:42

Contributor

litt3 commented Jul 24, 2025

We are not going to merge this PR. @cody-littley is going to open a new PR for this

Keeping it open in draft for now, for reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet