Skip to content

Conversation

@GauravNagesh
Copy link
Contributor

Description

This PR addresses two related fixes in the PSUd module:

  1. Exception Handling for Redis DB Calls - Wrapped Redis interactions in try-except blocks to handle potential exceptions, such as connection issues or missing keys. This improves the robustness of the application and prevents unexpected crashes due to Redis unavailability.

  2. Fix Errors in DaemonPsud Object Destruction During Tests - Resolved test errors that occurred when objects were garbage collected and their destructors attempted to delete missing Redis keys. The del method is now mocked, ensuring clean test teardown without unhandled exceptions.

Motivation and Context

Previously, if there was a temporary failure during the initial Redis connection to STATE_DB, table creation, or any Redis write/set operation, the daemon would crash instead of handling the error gracefully.
With this update:

  • If initialization fails, the daemon will now exit gracefully with appropriate logging.
  • If a Redis set or write operation fails during runtime, the error will be logged, and the operation will be retried in the next daemon cycle.

Additionally, during testing, when the DaemonPsud object is garbage collected, its destructor method __del__ is invoked. At that point, it attempts to delete keys from the mock Redis tables (psu_tbl and chassis_tbl). However, since these keys may not exist in the mocked environment, it results in a KeyError, as shown below:

  Traceback (most recent call last):
    File "/__w/1/s/sonic-psud/scripts/psud", line 419, in __del__
      self.psu_tbl._del(get_psu_key(psu_index))
    File "/__w/1/s/sonic-psud/tests/mocked_libs/swsscommon/swsscommon.py", line 16, in _del
      del self.mock_dict[key]
          ~~~~~~~~~~~~~~^^^^^
  KeyError: 'PSU 1'
  Traceback (most recent call last):
    File "/__w/1/s/sonic-psud/scripts/psud", line 422, in __del__
      self.chassis_tbl._del(CHASSIS_INFO_POWER_KEY_TEMPLATE.format(1))
    File "/__w/1/s/sonic-psud/tests/mocked_libs/swsscommon/swsscommon.py", line 16, in _del
      del self.mock_dict[key]
          ~~~~~~~~~~~~~~^^^^^
  KeyError: 'chassis_power_budget 1'

Both of these errors are currently visible in the pipeline.

How Has This Been Tested?

Tested locally to confirm that initialization failures are handled gracefully and that destructor-related key deletion no longer raises errors.

Additional Information (Optional)

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@GauravNagesh
Copy link
Contributor Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vvolam vvolam requested a review from Copilot November 26, 2025 16:21
Copilot finished reviewing on behalf of vvolam November 26, 2025 16:22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the robustness of the PSUd daemon by adding comprehensive exception handling for Redis database operations and fixing destructor-related test failures. The changes ensure that Redis connection issues or missing keys don't cause unexpected crashes during normal operation or test garbage collection.

  • Added try-except blocks around all Redis database operations (table.set, table._del) with appropriate error logging
  • Wrapped database initialization in exception handling with graceful exit on failure
  • Mocked __del__ method in tests to prevent KeyError exceptions during garbage collection
  • Added comprehensive test coverage for all new exception handling paths

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
sonic-psud/scripts/psud Added exception handling for Redis operations in initialization, destructor, and all table update methods
sonic-psud/tests/test_psud.py Mocked __del__ method to prevent garbage collection issues
sonic-psud/tests/test_DaemonPsud.py Added tests for exception handling in initialization, destructor, and update methods; mocked __del__
sonic-psud/tests/test_PsuChassisInfo.py Added test for exception handling in power budget update

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self.phy_entity_tbl = swsscommon.Table(state_db, PHYSICAL_ENTITY_INFO_TABLE)
except Exception as e:
self.log_error(f"Failed to connect to STATE_DB or Redis Tables : {type(e).__name__} : {e}", True)
sys.exit(exit_code+2)
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression exit_code+2 is incorrect. Since exit_code is a global variable initialized to 0 and only modified in the signal handler, this will always evaluate to 2. Either use the constant 2 directly, or define a named constant like DB_CONNECTION_ERROR = 2 at the module level (similar to PSUUTIL_LOAD_ERROR = 1) for clarity and consistency.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

with pytest.raises(SystemExit):
daemon_psud = psud.DaemonPsud(SYSLOG_IDENTIFIER)

mock_sys_exit.assert_called_once_with(2)
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test expects sys.exit(2) to be called, but the implementation calls sys.exit(exit_code+2). Since exit_code is 0 by default, this happens to work, but if the test setup changes the global exit_code, this assertion will fail. The test should either verify the actual implementation behavior or the implementation should be fixed to use a constant.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Test exception handling when writing PSU fan LED status to Redis
mock_fan.get_status_led = mock.Mock(return_value=MockFan.STATUS_LED_COLOR_OFF)
# mock_fan.get_name = mock.Mock(return_value="PSU 1 Test Fan 1")
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the commented-out code. If this line is not needed, it should be deleted rather than left as a comment.

Suggested change
# mock_fan.get_name = mock.Mock(return_value="PSU 1 Test Fan 1")

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Test exception when deleting PSU tables
daemon_psud.psu_tbl._del.side_effect = Exception("Redis delete error")
daemon_psud.__del__()
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The del special method is called explicitly.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

@GauravNagesh GauravNagesh Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing it as del daemon_psud will not that guarantee that the python's garbage collector will invoke the object's __del__ method during the execution of this testcase itself. Therefore, calling __del__() explicitly ensures deterministic execution of the cleanup logic for testing purposes.


# Test exception when deleting chassis tables
daemon_psud.chassis_tbl._del.side_effect = Exception("Redis delete error")
daemon_psud.__del__()
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The del special method is called explicitly.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing it as del daemon_psud will not that guarantee that the python's garbage collector will invoke the object's __del__ method during the execution of this testcase itself. Therefore, calling __del__() explicitly ensures deterministic execution of the cleanup logic for testing purposes.

mock_table.return_value = mock_table_instance

# Trigger the chassis_tbl.set call during __init__ by creating a new instance
daemon_psud = psud.DaemonPsud(SYSLOG_IDENTIFIER)
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable daemon_psud is not used.

Suggested change
daemon_psud = psud.DaemonPsud(SYSLOG_IDENTIFIER)
psud.DaemonPsud(SYSLOG_IDENTIFIER)

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Trigger the exception during __init__ by creating a new instance
with pytest.raises(SystemExit):
daemon_psud = psud.DaemonPsud(SYSLOG_IDENTIFIER)
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'daemon_psud' is unnecessary as it is redefined before this value is used.

Suggested change
daemon_psud = psud.DaemonPsud(SYSLOG_IDENTIFIER)
psud.DaemonPsud(SYSLOG_IDENTIFIER)

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


def test_run_power_budget_db_exception(self):
chassis = MockChassis()
psu1 = MockPsu("PSU 1", 0, True, True)
Copy link

Copilot AI Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call to MockPsu.init with too many arguments; should be no more than 0.
Call to MockPsu.init with too many arguments; should be no more than 0.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MockPsu constructor does allow to pass these arguments and similar instantiations are used elsewhere in this file and other test files without any issues.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants