thermalctld: Redis Performance Improvements #705
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR implements significant Redis performance optimizations for the
thermalctlddaemon, specifically to theThermalMonitorchild thread by introducing conditional writes and batched operations, thereby significantly reducing database load and improving daemon efficiency. There aren't any changes to the main thread (Thermal Policy Manager) since most of the database write operations is done by the child thread (ThermalMonitor). There are 3 key changes involved :Batching - The daemon now uses
RedisPipeline, which instead of sending 1 request separately each time, batches multiple Redis request together and sends them. Also flushes any unfilled batches at the end of each monitoring cycle if any requests still remain. This reduces the number of round trips to the Redis database and improves throughput.Conditional field Updates -
FanStatusandTemperatureStatusobject already has methods and stores vital information needed to detect when any PSU info fields change. We use this to our benefit to only write those fields which frequently change like thespeedandtimestampetc while only occasionally writing fields which rarely change during daemon runtime likemodel,serial,drawer_name,is_replaceable,status,speed_target, etc. Write all fields on first run, later only write fields to Redis when their values change. We get significant reduction in payload size.Enhanced State Tracking for FanDrawer and LEDs - Added new
FanDrawerStatusclass to track fan drawer state changes. Addedled_statusfield caching inFanStatusclass. This is have separate dictionaries for fan and fan drawer status tracking. Also added an instance methodset_led_statusto update and store the led status and return True if status changed else False. All these will reduce writing value toFAN_INFOandFAN_DRAWER_INFOtables even though there was no change.Additionally, existing tests were updated to include above change changes and to mock the new RedisPipeline and FieldValuePairs constructors. It also adds Exception Handling around DB calls where necessary. Few key updated to the tests are shown below :
TestFanDrawerStatus- Complete coverage of new FanDrawerStatus functionalitytest_fan_conditional_write_no_change(): Validates minimal writes when state unchangedtest_fan_drawer_conditional_write(): Validates drawer-level conditional writestest_fan_drawer_led_conditional_write(): Validates LED-specific write optimizationtest_fan_under_speed()andtest_fan_over_speed()to validate conditional writesMotivation and Context
Why RedisPipeline and Batching :
Currently, the thermalctld daemon's
ThermalMonitorthread creates and maintains a separate db connection internally for each table used and an additional initial connection to state_db as shown here Table Constructor and RedisPipeline ConstructorSo in total we maintain 5 separate Redis database connections in psud (1 connection each for FAN_INFO_TABLE, FAN_DRAWER_INFO_TABLE_NAME, PHYSICAL_ENTITY_INFO_TABLE and the initial connection to STATE_DB) which is waste of Redis and system resources. Moreover, since this thread executes all requests in a sequential manner, we don't need multiple connections. Just one connection shared across all the tables is sufficient.
Also in the existing flow, individual set() calls are made for each device and their attribute. Rather we can collect all device data in batches using RedisPipeline and then do a single bulk update, which will be faster.
We can also optimize database cleanup operations, where we do multiple individual delete operations in del method for each table key across all tables. Rather we can also batch the delete operations using RedisPipeline for faster cleanup during daemon shutdown, which will be quick in signal handling and reboot scenarios.
Why Conditional field Updates and LED status change detection :
Every 60 seconds the daemon writes full thermal and fan/fan drawer info to Redis even though much of the information rarely changes like the
model number,serial number,presence,status,replaceableandspeed_targetthreshold levels which is adding to unnecessary traffic to Redis. Rather we can write these fields only when a presence change is detected while always writing the frequently changing fields liketemp,speedandtimestamp, etc to ensure updated info stays in Redis.No additional changes is required as all info needed for detecting change in already part of FanStatus object, all of which is mainly in the
_refresh_fan_drawer_statusfunction.The same happens to led_status fields as well where, its written to Redis every 60 seconds even though there isn't any change. Also, adding an instance variable to the FanStatus will not be a burden since the value stored will be a class variable predefined here or predefined and fetched here Hence no duplicate memory allocated but rather will just be a reference to this class variable will be stored here.
Why Adding State Tracking for FanDrawer :
Currently only the FanStatus objects exist which can be utilized in
_refresh_fan_drawer_statusfunction to implement conditional write to Redis. In a similar fashion, even the Fan drawer info is written to Redis even though there is no change, like itspresence,model,serial,is_replaceableandstatus, which can all be written to Redis only on change of presence or status. This is the same for Fan drawer LEDs as well. So, we can add a FanDrawerStatus class to keep track of presence and status, which can be utilized to implement conditional write for Fan Drawer. Since we only storeTrueorFalsein its instance methods and in Python True or False are all singletons, we will not incur extra memory but rather just a reference to these will be stored. The same goes forled_statuswhich is a class variable as explained earlier. To keep track of these objects, we add afan_drawer_status_dictinstance variable to theFanUpdaterclass.Overall benefit :
Reduced Redis traffic in the system by writing only on change
Before, roughly 9 fields written to Redis every 60 seconds
After, roughly 2-3 frequently-changing fields + conditional fields only when changed
Estimated Reduction of approximately 50-60% reduction in Redis write payload volume under steady state
Reduced IO network overhead by batching requests
Reduced Redis and system system resources by reusing the same connection for RedisPipeline rather than having a separate connection to Redis for each table
How Has This Been Tested?
The changes were tested on both virtual and physical platforms. Performance benchmarking was conducted to compare the baseline and the updated implementation, capturing relevant metrics across all database operations.
Additional Information (Optional)
Note : The
PHYSICAL_ENTITY_INFO_TABLEdata in theupdate_entity_infofunction should not be conditionally written, as this table is shared across multiple daemons and programs. Ifthermalctldcaches this data locally, and another daemon stops (triggering a deletion of the entirePHYSICAL_ENTITY_INFO_TABLE), thermalctld will no longer repopulate the table unless a change is detected or the service is restarted. This change ensures consistent updates to the shared table, even when other daemons modify or clear it.