Skip to content

Conversation

@GauravNagesh
Copy link
Contributor

Description

This PR introduces significant performance improvements to the PSU daemon by optimizing Redis database operations through batching, conditional field updates and LED status change detection. There are 3 key changes involved :

  1. Batching - The daemon now uses RedisPipeline, which instead of sending 1 request separately each time, batches multiple Redis request together and sends them. Also flushes any unfilled batches at the end of each monitoring cycle if any requests still remain. This reduces the number of round trips to the Redis database.

  2. Conditional field Updates - PsuStatus object already has methods and stores vital information needed to detect when any PSU info fields change. We use this to our benefit to only write those fields which frequently change like the temp, voltage, current, power, etc while only occasionally writing fields which rarely change during daemon run like model number, serial number, revision, presence, status, replaceable and threshold levels

  3. LED status change detection - Adding a led_status instance variable to store the last recorded led status color and an instance method set_led_status to update and store the led status and return True if status changed else False. This reduces writing this value to PSU_INFO_TABLE every 3 seconds even though there was no change.

Additionally, existing tests were updated to include above change changes and to mock the new RedisPipeline and FieldValuePairs constructors. It also adds Exception Handling around DB calls where necessary. Few key updated to the tests are shown below :

  1. test_power_threshold Updates
  • PSU power down scenario
  • PSU power up scenario
  • PSU absent scenario
  • PSU present scenario
  • Threshold invalid scenarios
  1. test_update_led_color Updates
  • Tests LED status change detection
  • Verifies DB writes only on status change
  • Comprehensive coverage of state transitions:
    • First run (OFF) → writes
    • No change (OFF→OFF) → no write
    • Change (OFF→GREEN) → writes
    • No change (GREEN→GREEN) → no write
    • change (GREEN→RED) → write
    • change (RED→GREEN) → write
  1. _construct_expected_fvp Updates - Helps accurately create the expected_fvp objects based on conditional field inclusion logic to help verify the Redis payload field.

Motivation and Context

Why RedisPipeline and Batching :
Currently, the psud daemon creates and maintains a separate db connection internally for each table used and an additional initial connection to state_db as shown here Table Constructor and RedisPipeline Constructor

So in total we maintain 5 separate Redis database connections in psud (1 connection each for CHASSIS_INFO_TABLE, PSU_INFO_TABLE, FAN_INFO_TABLE, PHYSICAL_ENTITY_INFO_TABLE and the initial connection to STATE_DB) which is waste of Redis and system resources. Moreover, since psud purely a single process executing all requests in a sequential manner, we don't need multiple connections. Just one connection shared across all the tables is sufficient.

Also in the existing flow, individual set() calls are made for each device and their attribute. Rather we can collect all device data in batches using RedisPipeline and then do a single bulk update, which will be faster.

We can also optimize database cleanup operations, where we do multiple individual delete operations in __del__ method for each table key across all tables. Rather we can also batch the delete operations using RedisPipeline for faster cleanup during daemon shutdown, which will be quick in signal handling and reboot scenarios.

Why Conditional field Updates and LED status change detection :
Every 3 seconds the daemon writes full PSU info to Redis even though much of the information rarely changes like the model number, serial number, revision, presence, status, replaceable and threshold levels which is adding to unnecessary traffic to Redis. Rather we can write these fields only when a presence change is detected while always writing the frequently changing fields like temp, voltage, current, power, etc to ensure updated info stays in Redis.
No additional changes is required as all info needed for detecting change in already part of PsuStatus object.

The same happens to led_status fields as well where, its written to Redis every 3 seconds even though there isnt any change. Also, adding an instance variable to the PsuStatus will not be a burden since the value stored will be a class variable predefined here or predefined and fetched here Hence no duplicate memory allocated but rather will just be a reference to this class variable will be stored here.

Overall benefit :

  • Reduced Redis traffic in the system by writing only on change
    Before, roughly 18 fields written to Redis every 3 seconds per PSU
    After roughly 8-9 frequently-changing fields + conditional fields only when changed
    Estimated Reduction of approximately 45-50% reduction in Redis write payload volume under steady state

  • Reduced IO network overhead by batching requests

  • Reduced Redis and system system resources by reusing the same connection for RedisPipeline rather than having a seperate connection to Redis for each table

How Has This Been Tested?

The changes were tested on both virtual and physical platforms. Performance benchmarking was conducted to compare the baseline and the updated implementation, capturing relevant metrics across all database operations.

Additional Information (Optional)

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants