psud: Redis performance improvements #704
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces significant performance improvements to the PSU daemon by optimizing Redis database operations through batching, conditional field updates and LED status change detection. There are 3 key changes involved :
Batching - The daemon now uses
RedisPipeline, which instead of sending 1 request separately each time, batches multiple Redis request together and sends them. Also flushes any unfilled batches at the end of each monitoring cycle if any requests still remain. This reduces the number of round trips to the Redis database.Conditional field Updates -
PsuStatusobject already has methods and stores vital information needed to detect when any PSU info fields change. We use this to our benefit to only write those fields which frequently change like thetemp,voltage,current,power, etc while only occasionally writing fields which rarely change during daemon run likemodel number,serial number,revision,presence,status,replaceableandthreshold levelsLED status change detection - Adding a
led_statusinstance variable to store the last recorded led status color and an instance methodset_led_statusto update and store the led status and return True if status changed else False. This reduces writing this value to PSU_INFO_TABLE every 3 seconds even though there was no change.Additionally, existing tests were updated to include above change changes and to mock the new
RedisPipelineandFieldValuePairsconstructors. It also adds Exception Handling around DB calls where necessary. Few key updated to the tests are shown below :test_power_thresholdUpdatestest_update_led_colorUpdates_construct_expected_fvpUpdates - Helps accurately create theexpected_fvpobjects based on conditional field inclusion logic to help verify the Redis payload field.Motivation and Context
Why RedisPipeline and Batching :
Currently, the psud daemon creates and maintains a separate db connection internally for each table used and an additional initial connection to state_db as shown here Table Constructor and RedisPipeline Constructor
So in total we maintain 5 separate Redis database connections in psud (1 connection each for
CHASSIS_INFO_TABLE,PSU_INFO_TABLE,FAN_INFO_TABLE,PHYSICAL_ENTITY_INFO_TABLEand the initial connection toSTATE_DB) which is waste of Redis and system resources. Moreover, since psud purely a single process executing all requests in a sequential manner, we don't need multiple connections. Just one connection shared across all the tables is sufficient.Also in the existing flow, individual
set()calls are made for each device and their attribute. Rather we can collect all device data in batches using RedisPipeline and then do a single bulk update, which will be faster.We can also optimize database cleanup operations, where we do multiple individual delete operations in
__del__method for each table key across all tables. Rather we can also batch the delete operations using RedisPipeline for faster cleanup during daemon shutdown, which will be quick in signal handling and reboot scenarios.Why Conditional field Updates and LED status change detection :
Every 3 seconds the daemon writes full PSU info to Redis even though much of the information rarely changes like the
model number,serial number,revision,presence,status,replaceableandthreshold levelswhich is adding to unnecessary traffic to Redis. Rather we can write these fields only when a presence change is detected while always writing the frequently changing fields liketemp,voltage,current,power, etc to ensure updated info stays in Redis.No additional changes is required as all info needed for detecting change in already part of PsuStatus object.
The same happens to led_status fields as well where, its written to Redis every 3 seconds even though there isnt any change. Also, adding an instance variable to the PsuStatus will not be a burden since the value stored will be a class variable predefined here or predefined and fetched here Hence no duplicate memory allocated but rather will just be a reference to this class variable will be stored here.
Overall benefit :
Reduced Redis traffic in the system by writing only on change
Before, roughly 18 fields written to Redis every 3 seconds per PSU
After roughly 8-9 frequently-changing fields + conditional fields only when changed
Estimated Reduction of approximately 45-50% reduction in Redis write payload volume under steady state
Reduced IO network overhead by batching requests
Reduced Redis and system system resources by reusing the same connection for RedisPipeline rather than having a seperate connection to Redis for each table
How Has This Been Tested?
The changes were tested on both virtual and physical platforms. Performance benchmarking was conducted to compare the baseline and the updated implementation, capturing relevant metrics across all database operations.
Additional Information (Optional)