Skip to content

Conversation

@snowmead
Copy link
Contributor

@snowmead snowmead commented Dec 3, 2025

Summary

Pushes Prometheus metrics instrumentation across BSP, MSP, and Fisherman task handlers to ensure failure paths are properly tracked. Adds Grafana with pre-configured dashboards for visualizing StorageHub metrics when running integration tests with telemetry enabled.

Notable changes

Metrics Instrumentation

  • bsp_download_file.rs: New metric bsp_download_requests_total with wrapper pattern - handle_download records success/failure metrics and delegates to handle_download_inner for actual logic.
  • bsp_delete_file.rs: Refactored metric recording into remove_file_from_file_storage to centralize success/failure tracking. Added failure metrics on forest storage retrieval errors.
  • bsp_charge_fees.rs: Added insolvent_users_processed_total failure counters on forest storage retrieval failures and forest root write tx errors.
  • bsp_submit_proof.rs: Added bsp_proofs_submitted_total failure counters when forest root tx is already taken or forest storage retrieval fails.
  • bsp_upload_file.rs: Added bsp_storage_requests_total failure counters on tx taken, wrong provider type, no BSP ID, empty proofs, and forest storage retrieval errors. Added bsp_upload_chunks_received_total for tracking chunk upload success/failure in RemoteUploadRequest handler.
  • msp_upload_file.rs: Added msp_storage_requests_total failure counters on tx taken, wrong provider type, and no MSP ID errors.
  • msp_delete_file.rs: Added msp_files_deleted_total failure counters on forest storage and metadata retrieval errors.
  • msp_move_bucket.rs: Success metric now recorded after actual download completion rather than after on-chain confirmation. Added failure metrics for indexer disabled and download failures. Empty bucket (no files) correctly records success.
  • msp_retry_bucket_move.rs: Added msp_bucket_move_retries_total for tracking retry attempt success/failure in RetryBucketMoveDownload handler.
  • msp_verify_bucket_forests.rs: Added msp_forest_verifications_total counter and msp_forest_verification_seconds histogram for tracking forest verification operations with timing.
  • msp_stop_storing_insolvent_user.rs: Added msp_buckets_deleted_total metrics for tracking bucket deletion success/failure in FinalisedMspStopStoringBucketInsolventUser handler.
  • sp_slash_provider.rs: Added sp_slash_submissions_total for tracking slash extrinsic submission success/failure in SlashableProvider handler.

Grafana Integration

  • New Grafana service in fullnet-base-template.yml on port 3030 with anonymous viewer access enabled.
  • Auto-provisioned Prometheus datasource pointing to sh-prometheus:9090.
  • Pre-configured dashboards:
    • BSP Dashboard: Storage operations, proofs, downloads, deletions, fees, proof generation latency, chunk uploads
    • MSP Dashboard: Storage operations, file distribution, deletions, bucket moves, request duration, forest verification & retries
    • Fisherman Dashboard: Batch deletions overview, success rate gauge, distribution charts

Test Infrastructure

  • Renamed test directory from prometheus/ to telemetry/ for broader scope.
  • Added test:telemetry and test:telemetry:only scripts to package.json.
  • Added bsp_download_requests_total to metrics validation test.
  • Centralized Prometheus and Grafana ports in NODE_INFOS constants.
  • Network launcher now starts Grafana alongside Prometheus in fullnet mode.

Note for Node Operators

Download the pre fabricated Grafana dashboards specific to each Storage Hub node role to start off with a baseline.

Snippet of MSP Grafana Dashboard

image

Metrics Reference

BSP Metrics

Metric Type Status Labels
storagehub_bsp_storage_requests_total Counter pending, success, failure
storagehub_bsp_proofs_submitted_total Counter pending, success, failure
storagehub_bsp_fees_charged_total Counter success, failure
storagehub_bsp_files_deleted_total Counter success, failure
storagehub_bsp_bucket_moves_total Counter pending, success, failure
storagehub_bsp_download_requests_total Counter success, failure
storagehub_bsp_upload_chunks_received_total Counter success, failure
storagehub_bsp_proof_generation_seconds Histogram success, failure

MSP Metrics

Metric Type Status Labels
storagehub_msp_storage_requests_total Counter pending, success, failure
storagehub_msp_files_distributed_total Counter pending, success, failure
storagehub_msp_files_deleted_total Counter success, failure
storagehub_msp_buckets_deleted_total Counter success, failure
storagehub_msp_fees_charged_total Counter success, failure
storagehub_msp_bucket_moves_total Counter pending, success, failure
storagehub_msp_bucket_move_retries_total Counter success, failure
storagehub_msp_forest_verifications_total Counter success, failure
storagehub_msp_forest_verification_seconds Histogram success, failure

SP Metrics

Metric Type Status Labels
storagehub_sp_slash_submissions_total Counter success, failure

General Metrics

Metric Type Status Labels
storagehub_storage_request_seconds Histogram success, failure
storagehub_file_transfer_seconds Histogram success, failure
storagehub_insolvent_users_processed_total Counter success, failure
storagehub_fisherman_batch_deletions_total Counter success, failure

Download Metrics

Metric Type Status Labels
storagehub_bytes_downloaded_total Counter success, failure
storagehub_chunks_downloaded_total Counter success, failure
storagehub_file_download_seconds Histogram success, failure

  - Add metrics.rs with StorageHubMetrics struct and helper macros
  - Instrument BSP tasks: upload, proof submission, fees, deletion, bucket moves
  - Instrument MSP tasks: upload, deletion, distribution, fees, bucket moves
  - Instrument fisherman batch deletions and file downloads
  - Add prometheus.yml config and Docker integration
  - Add centralized test/util/prometheus.ts API
  - Add integration tests for all metrics

  Metrics tracked: storage requests, proofs, fees, deletions, bucket moves,
  file transfers, and download operations with status labels and histograms.
…r tracking

- Introduced new macros for metrics incrementing and histogram observation, allowing for cleaner and more consistent metric tracking across various tasks.
- Updated file download manager to utilize new macros for recording successful and failed download metrics.
- Enhanced proof generation task to track timing metrics for both success and failure scenarios.
- Improved storage request handling in upload tasks to increment metrics based on success or failure of confirmations.
- Refactored existing metric tracking code to reduce redundancy and improve readability.
Resolved conflicts by combining metrics instrumentation from feat/telemetry
with improved return messages from main in:
- bsp_upload_file.rs
- msp_delete_bucket.rs
- msp_distribute_file.rs
- msp_upload_file.rs
@snowmead snowmead changed the title feat(client): add Prometheus metrics instrumentation feat(client): Add Prometheus metrics instrumentation Dec 3, 2025
@snowmead snowmead changed the title feat(client): Add Prometheus metrics instrumentation feat: Add Prometheus metrics instrumentation Dec 3, 2025
@snowmead snowmead changed the title feat: Add Prometheus metrics instrumentation feat: add telemetry metrics instrumentation Dec 3, 2025
…some tasks, add telemetry integration package script
@snowmead snowmead added B5-clientnoteworthy Changes should be mentioned client-related release notes D3-trivial👶 PR contains trivial changes that do not require an audit not-breaking Does not need to be mentioned in breaking changes labels Dec 3, 2025
@snowmead snowmead requested a review from ffarall December 3, 2025 19:19
snowmead and others added 6 commits December 4, 2025 08:25
…ption

  Add metrics instrumentation for previously uncovered task event handlers:
  - bsp_upload_file: chunk upload success/failure counters
  - msp_retry_bucket_move: retry attempt counters
  - msp_verify_bucket_forests: verification counters with duration histogram
  - msp_stop_storing_insolvent_user: bucket deletion counters
  - sp_slash_provider: slash submission counters

  Update Grafana dashboards with new panels for chunk uploads (BSP) and
  forest verification/retries (MSP).
@snowmead snowmead requested a review from TDemeco December 5, 2025 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B5-clientnoteworthy Changes should be mentioned client-related release notes D3-trivial👶 PR contains trivial changes that do not require an audit not-breaking Does not need to be mentioned in breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants