-
Notifications
You must be signed in to change notification settings - Fork 756
[SmartSwitch] Extend reboot script for rebooting SmartSwitch #3566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 54 commits
Commits
Show all changes
56 commits
Select commit
Hold shift + click to select a range
1686dbe
Extend reboot script for rebooting SmartSwitch
vvolam 23461b2
Add more coverage
vvolam cef5de7
Add more unittests and optimize tests file
vvolam d41bf43
Fix minor indentation
vvolam 68e70ab
Move smartswitch helper functions to new reboot_smartswitch_helper.sh
vvolam 3848b75
Fix pre-commit errors
vvolam 84d9e50
Fix few more indentation errors
vvolam ba5cd5d
Merge remote-tracking branch 'origin/master' into ss-reboot
vvolam 7f75134
Merge remote-tracking branch 'origin/master' into ss-reboot
vvolam a849e41
Add a new API in chassis.py
vvolam 4975ac0
Fix issues while testing
vvolam bead103
Fix indentation errors
vvolam f88491a
Add DPU_BUS_INFO
vvolam b3dbc0f
Fix pre-commit errors
vvolam 2d8b908
Add more error handling scenarios and increase more coverage
vvolam 1a6ef04
parse_args function is not required
vvolam ec21d6f
Fix indentation
vvolam 8d59222
Address review comments
vvolam 98406c7
Increase code coverage
vvolam a6f771e
Update scripts/reboot_smartswitch_helper
vvolam a3f8af7
Update scripts/reboot_smartswitch_helper
vvolam 36ecf1b
Rename module_base.py to module.py
vvolam 67e7817
Committing missed files in previous commit
vvolam 88af21d
Define a new try_get_args() which takes arguments as inputs
vvolam 3551a5a
Merge remote-tracking branch 'origin/master' into ss-reboot
vvolam e4bcc95
Fix some arguments
vvolam edaa0de
Exit the reboot script after completing DPU reboot
vvolam 119d83b
Fix long lines
vvolam 03fd56f
Merge remote-tracking branch 'origin/master' into ss-reboot
vvolam 75f4c26
Update unit tests for update function code
vvolam 7b365fd
Merge remote-tracking branch 'public/master' into ss-reboot
vvolam 2690348
Update scripts/reboot
vvolam dc504ea
Address few review comments
vvolam e878277
Merge remote-tracking branch 'public/master' into ss-reboot
vvolam d1488e8
Merge remote-tracking branch 'public/master' into ss-reboot
vvolam 1aa56e1
Add -notls option to gnoi_client command
vvolam 9aa2c88
Small fix to read smart_switch correctly
vvolam 2d7d141
Merge remote-tracking branch 'origin/master' into ss-reboot
vvolam 416fa37
Call the module in upper case
vvolam 4f0239e
Merge remote-tracking branch 'origin/master' into ss-reboot
vvolam af899a3
Use the string to HALT instead of numeric
vvolam d8bf907
Suppress outputs
vvolam 61dcbb3
Few enhancements to error path
vvolam f5c38e5
insecure communication is not required to GNMI server
vvolam 9f1b9e1
Enhance logging for better troubleshooting
vvolam 91e2e04
Small fix to not split the line
vvolam 02ccc84
Default port of GNMI server is 8080, in the absence of configuration
vvolam 635cdc4
Merge remote-tracking branch 'public/master' into ss-reboot
vvolam 0bec6c8
Remove unneccessary checks
vvolam d9c6e33
Small fix to fix gnoi_client command line
vvolam 21d84ec
Address comments
vvolam c7b4ab5
Merge remote-tracking branch 'public/master' into ss-reboot
vvolam 389ae8a
Minor fixes according to latest GNOI backend changes
vvolam 73300df
Wait for DPU reboot status only if reboot gnoi command is successful
vvolam 2cdf0c9
Skip reboot if DPU is not operationally online
vvolam d22356e
Fix oper status check
vvolam File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vvolam marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,287 @@ | ||
| #!/bin/bash | ||
|
|
||
| declare -r GNMI_PORT=8080 # Default GNMI port | ||
| declare -r MODULE_REBOOT_DPU="DPU" | ||
| declare -r MODULE_REBOOT_SMARTSWITCH="SMARTSWITCH" | ||
|
|
||
| # Function to print debug message | ||
| function log_message() { | ||
| local message=$1 | ||
| echo "$(date '+%Y-%m-%d %H:%M:%S') - $message" >&2 | ||
| } | ||
|
|
||
| # Function to check if running on smart switch | ||
| function is_smartswitch() | ||
| { | ||
| python3 -c "from utilities_common.chassis import is_smartswitch; print(is_smartswitch())" | grep "True" | ||
| } | ||
|
|
||
| # Function to check if running on DPU | ||
| function is_dpu() | ||
| { | ||
| python3 -c "from utilities_common.chassis import is_dpu; print(is_dpu())" | grep "True" | ||
| } | ||
|
|
||
| # Function to retrieve number of DPUs | ||
| function get_num_dpus() | ||
| { | ||
| python3 -c "from utilities_common.chassis import get_num_dpus; print(get_num_dpus())" | ||
| } | ||
|
|
||
| # Function to retrieve DPU IP from CONFIG_DB | ||
| function get_dpu_ip() | ||
| { | ||
| local DPU_NAME=$1 | ||
| sonic-db-cli CONFIG_DB HGET "DHCP_SERVER_IPV4_PORT|bridge-midplane|${DPU_NAME}" "ips@" | ||
| } | ||
|
|
||
| # Function to retrieve GNMI port from CONFIG_DB | ||
| function get_gnmi_port() | ||
| { | ||
| local DPU_NAME=$1 | ||
| sonic-db-cli CONFIG_DB HGET "DPU_PORT|$DPU_NAME" "gnmi_port" | ||
| } | ||
|
|
||
| # Function to get reboot status from DPU | ||
| function get_reboot_status() | ||
| { | ||
| local dpu_ip=$1 | ||
| local port=$2 | ||
| $(docker exec gnmi gnoi_client -target "${dpu_ip}:${port}" -logtostderr -notls -module System -rpc RebootStatus &>/dev/null) | ||
vvolam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if [ $? -ne 0 ]; then | ||
| return ${EXIT_ERROR} | ||
| fi | ||
| local is_reboot_active | ||
| is_reboot_active=$(echo "$reboot_status" | grep "active" | awk '{print $2}') | ||
| if [ "$is_reboot_active" == "false" ]; then | ||
| return ${EXIT_SUCCESS} | ||
| fi | ||
| return ${EXIT_ERROR} | ||
| } | ||
|
|
||
| # Function to detach PCI module | ||
| function pci_detach_module() | ||
| { | ||
| local DPU_NAME=$1 | ||
| local DPU_BUS_INFO=$2 | ||
vvolam marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| python3 -c "from utilities_common.module import ModuleHelper; helper = ModuleHelper(); helper.pci_detach_module('${DPU_NAME}')" | ||
| if [ $? -ne 0 ]; then | ||
| log_message "ERROR: PCI detach vendor API is not available" | ||
| echo 1 > /sys/bus/pci/devices/${DPU_BUS_INFO}/remove | ||
| fi | ||
| } | ||
|
|
||
| # Function to rescan PCI module | ||
| function pci_reattach_module() | ||
| { | ||
| local DPU_NAME=$1 | ||
| local DPU_BUS_INFO=$2 | ||
| python3 -c "from utilities_common.module import ModuleHelper; helper = ModuleHelper(); helper.pci_reattach_module('${DPU_NAME}')" | ||
| if [ $? -ne 0 ]; then | ||
| log_message "ERROR: PCI reattach vendor API is not available" | ||
| echo 1 > /sys/bus/pci/devices/${DPU_BUS_INFO}/rescan | ||
| fi | ||
| } | ||
|
|
||
| # Function to reboot DPU | ||
| function reboot_dpu_platform() | ||
| { | ||
| local DPU_NAME=$1 | ||
| local REBOOT_TYPE=$2 | ||
| python3 -c "from utilities_common.module import ModuleHelper; helper = ModuleHelper(); helper.reboot_module('${DPU_NAME}', '${REBOOT_TYPE}')" | ||
| log_message "INFO: Rebooting ${DPU_NAME} with reboot_type:${REBOOT_TYPE}..." | ||
| } | ||
|
|
||
| # Function to wait for DPU reboot status | ||
| function wait_for_dpu_reboot_status() | ||
| { | ||
| local dpu_ip=$1 | ||
| local port=$2 | ||
| local DPU_NAME=$3 | ||
|
|
||
| if [[ -z "$PLATFORM_JSON_PATH" ]]; then | ||
| log_message "ERROR: PLATFORM_JSON_PATH is not defined" | ||
| exit $EXIT_ERROR | ||
| fi | ||
|
|
||
| local dpu_halt_services_timeout=$(jq -r '.dpu_halt_services_timeout' "$PLATFORM_JSON_PATH" 2>/dev/null) | ||
| if [ -z "$dpu_halt_services_timeout" ] || [ "$dpu_halt_services_timeout" == "null" ]; then | ||
| # Default timeout | ||
| dpu_halt_services_timeout=60 | ||
| fi | ||
|
|
||
| local poll_interval=5 | ||
| local waited_time=0 | ||
| while true; do | ||
| local reboot_status | ||
| get_reboot_status "${dpu_ip}" "${port}" | ||
| reboot_status=$? | ||
| if [ $reboot_status -eq ${EXIT_SUCCESS} ]; then | ||
| log_message "INFO: ${DPU_NAME} halted the services successfully" | ||
| break | ||
| fi | ||
|
|
||
| sleep "$poll_interval" | ||
| waited_time=$((waited_time + poll_interval)) | ||
| if [ $waited_time -ge $dpu_halt_services_timeout ]; then | ||
| log_message "ERROR: Timeout waiting for ${DPU_NAME} to finish halting the services" | ||
| return | ||
| fi | ||
| done | ||
| return | ||
| } | ||
|
|
||
| # Function to send reboot command to DPU | ||
| function gnmi_reboot_dpu() | ||
| { | ||
| # Retrieve DPU IP and GNMI port | ||
| dpu_ip=$(get_dpu_ip "${DPU_NAME}") | ||
| port=$(get_gnmi_port "${DPU_NAME}") | ||
| if [ -z "$port" ]; then | ||
| port=$GNMI_PORT # Default GNMI port | ||
| fi | ||
| log_message "INFO: Rebooting ${DPU_NAME}, ip:$dpu_ip gnmi_port:$port" | ||
|
|
||
| if [ -z "$dpu_ip" ]; then | ||
| log_message "ERROR: Failed to retrieve DPU IP for ${DPU_NAME}" | ||
| return ${EXIT_ERROR} | ||
| fi | ||
|
|
||
| $(docker exec gnmi gnoi_client -target ${dpu_ip}:${port} -logtostderr -notls -module System -rpc Reboot -jsonin '{"method":3, "message":"User initiated reboot"}' &>/dev/null) | ||
| if [ $? -ne 0 ]; then | ||
| log_message "ERROR: Failed to send gnoi command to halt services on ${DPU_NAME}" | ||
| log_message "ERROR: proceeding without halting the services" | ||
| else | ||
| # Wait for DPU to halt services, if reboot command is successful | ||
| wait_for_dpu_reboot_status "${dpu_ip}" "${port}" "${DPU_NAME}" | ||
| fi | ||
| } | ||
|
|
||
| function reboot_dpu() | ||
| { | ||
| local DPU_NAME=$1 | ||
| local REBOOT_TYPE=$2 | ||
| local DPU_INDEX=${DPU_NAME//[!0-9]/} | ||
|
|
||
| debug "User requested rebooting device ${DPU_NAME} ..." | ||
|
|
||
| # Send reboot command to DPU | ||
| gnmi_reboot_dpu "${DPU_NAME}" | ||
| if [ $? -ne 0 ]; then | ||
| log_message "ERROR: Failed to send gnoi command to reboot ${DPU_NAME}" | ||
| fi | ||
|
|
||
| local DPU_BUS_INFO=$(jq -r --arg DPU_NAME "$DPU_NAME" '.DPUS[$DPU_NAME].bus_info' "$PLATFORM_JSON_PATH") | ||
| if [ -z "$DPU_BUS_INFO" ] || [ "$DPU_BUS_INFO" = "null" ]; then | ||
| log_message "ERROR: Failed to retrieve bus info for ${DPU_NAME}" | ||
| return ${EXIT_ERROR} | ||
| fi | ||
|
|
||
| pci_detach_module ${DPU_NAME} ${DPU_BUS_INFO} | ||
| if [ $? -ne 0 ]; then | ||
| log_message "ERROR: Failed to detach PCI module for ${DPU_NAME}" | ||
| return ${EXIT_ERROR} | ||
| fi | ||
|
|
||
| reboot_dpu_platform ${DPU_NAME} ${REBOOT_TYPE} | ||
| if [ $? -ne 0 ]; then | ||
| log_message "ERROR: Failed to send platform command to reboot ${DPU_NAME}" | ||
| return ${EXIT_ERROR} | ||
| fi | ||
|
|
||
| if [[ "$REBOOT_TYPE" != $MODULE_REBOOT_SMARTSWITCH ]]; then | ||
| pci_reattach_module ${DPU_NAME} ${DPU_BUS_INFO} | ||
| fi | ||
| } | ||
|
|
||
| # Function to reboot all DPUs in parallel | ||
| function reboot_all_dpus() { | ||
| local NUM_DPU=$1 | ||
|
|
||
| if [[ -z $NUM_DPU ]]; then | ||
| log_message "ERROR: Failed to retrieve number of DPUs or no DPUs found" | ||
| return | ||
| fi | ||
|
|
||
| local failures=0 | ||
| for (( i=0; i<"$NUM_DPU"; i++ )); do | ||
| reboot_dpu "dpu$i" "$MODULE_REBOOT_SMARTSWITCH" & | ||
| if [ $? -ne 0 ]; then | ||
| ((failures++)) | ||
| fi | ||
| done | ||
| wait | ||
| return $failures | ||
| } | ||
|
|
||
| # Function to verify DPU module name | ||
| function verify_dpu_module_name() { | ||
| local DPU_MODULE_NAME=$1 | ||
| local NUM_DPU=$2 | ||
|
|
||
| if [[ -z "$DPU_MODULE_NAME" ]]; then | ||
| log_message "ERROR: DPU module name not provided" | ||
| return $EXIT_ERROR | ||
| fi | ||
|
|
||
| NUM_DPU=$((NUM_DPU - 1)) | ||
| if [[ ! "$DPU_MODULE_NAME" =~ ^dpu[0-$NUM_DPU]$ ]]; then | ||
| log_message "ERROR: Invalid DPU module name provided" | ||
| return $EXIT_ERROR | ||
| fi | ||
| } | ||
|
|
||
| # Function to handle scenarios on smart switch | ||
| function handle_smart_switch() { | ||
| local REBOOT_DPU=$1 | ||
| local PRE_SHUTDOWN=$2 | ||
| local DPU_NAME=$3 | ||
|
|
||
| NUM_DPU=$(get_num_dpus) | ||
|
|
||
| if is_dpu; then | ||
| if [[ "$PRE_SHUTDOWN" != "yes" ]]; then | ||
| log_message "ERROR: '-p' option not specified for a DPU" | ||
| return $EXIT_ERROR | ||
| elif [[ "$REBOOT_DPU" == "yes" ]]; then | ||
| log_message "ERROR: '-d' option specified for a DPU" | ||
| return $EXIT_ERROR | ||
| fi | ||
| return $EXIT_SUCCESS | ||
| fi | ||
|
|
||
| if [[ "$PRE_SHUTDOWN" == "yes" ]]; then | ||
| log_message "ERROR: '-p' option specified for a non-DPU" | ||
| return $EXIT_ERROR | ||
| fi | ||
|
|
||
| if [[ "$REBOOT_DPU" == "yes" ]]; then | ||
| if is_smartswitch; then | ||
| if [[ -z $NUM_DPU ]]; then | ||
| log_message "ERROR: Failed to retrieve number of DPUs or no DPUs found" | ||
| return $EXIT_ERROR | ||
| fi | ||
|
|
||
| DPU_MODULE_NAME="${DPU_NAME,,}" | ||
| verify_dpu_module_name "$DPU_MODULE_NAME" "$NUM_DPU" | ||
| result=$? | ||
| if [[ $result -ne $EXIT_SUCCESS ]]; then | ||
| return $result | ||
| fi | ||
|
|
||
| reboot_dpu "$DPU_MODULE_NAME" "$MODULE_REBOOT_DPU" | ||
| result=$? | ||
| return $result | ||
| else | ||
| log_message "ERROR: '-d' option specified for a non-smart-switch" | ||
| return $EXIT_ERROR | ||
| fi | ||
| fi | ||
|
|
||
| # If the system is a smart switch, reboot all DPUs in parallel | ||
| if is_smartswitch; then | ||
| reboot_all_dpus "$NUM_DPU" "$MODULE_REBOOT_SMARTSWITCH" | ||
| result=$? | ||
| return $result | ||
| fi | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.