Skip to content

Conversation

@mrgrain
Copy link
Contributor

@mrgrain mrgrain commented Nov 19, 2025

This PR refactors the package stats feature to use AWS Step Functions instead of a single Lambda function. The change addresses a fundamental scalability challenge: fetching download statistics from NPM's API for thousands of packages is too slow and resource-intensive for a single Lambda execution.

Why This Change

The package stats feature provides users with NPM download counts on package pages, helping them make informed decisions about which packages to adopt based on community usage. However, the original implementation using a single Lambda function couldn't scale to handle large package catalogs efficiently. Fetching statistics for each package sequentially would take hours and risk Lambda timeouts, while fetching them all in parallel would overwhelm NPM's API and exhaust Lambda memory.

The solution is a distributed map-reduce architecture using Step Functions. By splitting the work into chunks and processing them in parallel with controlled concurrency, we can complete updates within a reasonable timeframe while respecting NPM's rate limits. The state machine provides built-in retry logic, error handling, and visibility into which stage of processing fails.

State Machine Architecture

The workflow follows a three-phase map-reduce pattern:

┌─────────────────┐
│  ChunkPackages  │  Reads catalog.json and splits package list
│    (Lambda)     │  into chunks of ~100 packages each
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ ProcessChunksMap│  Parallel processing (max 10 concurrent)
│   (Map State)   │  Each iteration processes one chunk
└────────┬────────┘
         │
         ├──────┬──────┬──────┬──────┐
         ▼      ▼      ▼      ▼      ▼
    ┌────────────────────────────┐
    │     ProcessChunk (Lambda)  │  Fetches NPM stats for packages
    │  - Retries once on failure │  in this chunk, writes to S3
    │  - Catches errors to allow │
    │    partial success         │
    └────────────────────────────┘
         │
         └──────┴──────┴──────┴──────┘
         │
         ▼
┌─────────────────┐
│ AggregateResults│  Reads all chunk results from S3,
│    (Lambda)     │  combines into final stats.json
└─────────────────┘

The Chunker reads the catalog to determine which packages need statistics and divides them into manageable groups. The Processor functions run in parallel, each fetching statistics from NPM for their assigned packages and writing intermediate results to S3. The Aggregator collects all these intermediate results and produces the final stats.json file that the frontend consumes. The map state is configured with a maximum concurrency of 10 to avoid overwhelming the NPM API, and each processor includes retry logic with a 5-minute backoff to handle transient issues.

What Changed

The implementation replaces the single Lambda function with a Step Functions state machine orchestrating three specialized Lambda functions. The CloudWatch alarm now monitors state machine failures instead of Lambda errors, providing better visibility into which stage fails. The operator runbook has been significantly expanded with narrative explanations of why the feature exists, how it's architected, detailed investigation steps for each failure scenario, and the state machine diagram shown above. The backend dashboard now links to the state machine and all three Lambda functions for easier troubleshooting.

Comprehensive unit tests cover all three Lambda functions, and a new integration test validates the complete end-to-end workflow execution. All existing tests pass, and snapshot tests have been updated to reflect the new infrastructure.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@mrgrain mrgrain force-pushed the mrgrain/feat/package-stats-state-machine branch from 6874c8e to 449031e Compare November 19, 2025 15:03
@mrgrain mrgrain force-pushed the mrgrain/feat/package-stats-state-machine branch from 6e6a44e to e3c1a92 Compare November 19, 2025 15:57
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@cdklabs-automation cdklabs-automation added this pull request to the merge queue Nov 19, 2025
Merged via the queue into main with commit 32fc81e Nov 19, 2025
7 checks passed
@cdklabs-automation cdklabs-automation deleted the mrgrain/feat/package-stats-state-machine branch November 19, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants