feat: use state machine to process package stats #1808

mrgrain · 2025-11-19T14:52:34Z

This PR refactors the package stats feature to use AWS Step Functions instead of a single Lambda function. The change addresses a fundamental scalability challenge: fetching download statistics from NPM's API for thousands of packages is too slow and resource-intensive for a single Lambda execution.

Why This Change

The package stats feature provides users with NPM download counts on package pages, helping them make informed decisions about which packages to adopt based on community usage. However, the original implementation using a single Lambda function couldn't scale to handle large package catalogs efficiently. Fetching statistics for each package sequentially would take hours and risk Lambda timeouts, while fetching them all in parallel would overwhelm NPM's API and exhaust Lambda memory.

The solution is a distributed map-reduce architecture using Step Functions. By splitting the work into chunks and processing them in parallel with controlled concurrency, we can complete updates within a reasonable timeframe while respecting NPM's rate limits. The state machine provides built-in retry logic, error handling, and visibility into which stage of processing fails.

State Machine Architecture

The workflow follows a three-phase map-reduce pattern:

┌─────────────────┐
│  ChunkPackages  │  Reads catalog.json and splits package list
│    (Lambda)     │  into chunks of ~100 packages each
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ ProcessChunksMap│  Parallel processing (max 10 concurrent)
│   (Map State)   │  Each iteration processes one chunk
└────────┬────────┘
         │
         ├──────┬──────┬──────┬──────┐
         ▼      ▼      ▼      ▼      ▼
    ┌────────────────────────────┐
    │     ProcessChunk (Lambda)  │  Fetches NPM stats for packages
    │  - Retries once on failure │  in this chunk, writes to S3
    │  - Catches errors to allow │
    │    partial success         │
    └────────────────────────────┘
         │
         └──────┴──────┴──────┴──────┘
         │
         ▼
┌─────────────────┐
│ AggregateResults│  Reads all chunk results from S3,
│    (Lambda)     │  combines into final stats.json
└─────────────────┘

The Chunker reads the catalog to determine which packages need statistics and divides them into manageable groups. The Processor functions run in parallel, each fetching statistics from NPM for their assigned packages and writing intermediate results to S3. The Aggregator collects all these intermediate results and produces the final stats.json file that the frontend consumes. The map state is configured with a maximum concurrency of 10 to avoid overwhelming the NPM API, and each processor includes retry logic with a 5-minute backoff to handle transient issues.

What Changed

The implementation replaces the single Lambda function with a Step Functions state machine orchestrating three specialized Lambda functions. The CloudWatch alarm now monitors state machine failures instead of Lambda errors, providing better visibility into which stage fails. The operator runbook has been significantly expanded with narrative explanations of why the feature exists, how it's architected, detailed investigation steps for each failure scenario, and the state machine diagram shown above. The backend dashboard now links to the state machine and all three Lambda functions for easier troubleshooting.

Comprehensive unit tests cover all three Lambda functions, and a new integration test validates the complete end-to-end workflow execution. All existing tests pass, and snapshot tests have been updated to reflect the new infrastructure.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

cdklabs-automation enabled auto-merge November 19, 2025 14:52

mrgrain force-pushed the mrgrain/feat/package-stats-state-machine branch from 6874c8e to 449031e Compare November 19, 2025 15:03

rix0rrr approved these changes Nov 19, 2025

View reviewed changes

feat: use state machine to process package stats

e3c1a92

mrgrain force-pushed the mrgrain/feat/package-stats-state-machine branch from 6e6a44e to e3c1a92 Compare November 19, 2025 15:57

chore: self mutation

2efc25b

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

cdklabs-automation added this pull request to the merge queue Nov 19, 2025

Merged via the queue into main with commit 32fc81e Nov 19, 2025
7 checks passed

cdklabs-automation deleted the mrgrain/feat/package-stats-state-machine branch November 19, 2025 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: use state machine to process package stats #1808

feat: use state machine to process package stats #1808

Uh oh!

mrgrain commented Nov 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: use state machine to process package stats #1808

feat: use state machine to process package stats #1808

Uh oh!

Conversation

mrgrain commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why This Change

State Machine Architecture

What Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mrgrain commented Nov 19, 2025 •

edited

Loading