Skip to content

sensor-missing alarms are not cleared after a service restart #9

@jktjkt

Description

@jktjkt

When the velia-hardware-g2 restarts and the sysrpeo-ietf-alarms stays alive, the system reports some active alarms which are not active anymore:

coherent-a-d-DQ000V9U
VELIA_VERSION=v1-45-g6347e3a2
HW               Manufacturer  Model                   Version                    S/N
ne                             sdn-roadm-coherent-a-d                             
ne:ctrl                                                                           1E70C61C94100051CC39A000A000006E
ne:ctrl:carrier                ClearFog Base                                      
ne:ctrl:emmc                   8GME4R                                             0x490e50e1
ne:ctrl:som                    ClearFog A388 SOM                                  
ne:edfa                        DUAL FIXED GAIN         1.0.1                      506969
ne:fans                                                                           1E70C61C94100061E42EA000A00000EC
ne:pdu           3Y POWER      YH-5151E (URP1X151AH)   B01R P2J700A00 A02         SA020T301647000254
ne:psu1          3Y POWER      YM-2151E (URP1X151AM)   B01R P2J700A00 A01         SA010T291647000507
ne:psu2          3Y POWER      YM-2151E (URP1X151AM)   B01R        P2J700A04 A17  SA170T292223002344 
ne:tap           Oplink        ITMA0805ECMD111         1.0 1.0                    C1927956
  Resource                   Severity  Detail                                                            Last raised                          Status
✕  ne:pdu                     critical  I2C read failure for PDU. Could not get hardware sensor details.  2024-10-25T18:30:52.713202261+00:00  cleared
✕  ne:psu1                    critical  PSU is unplugged.                                                 2024-10-25T18:30:53.148885564+00:00  cleared
✕  ne:psu2                    critical  PSU is unplugged.                                                 2024-10-25T18:30:53.526351099+00:00  cleared
⏶  ne:psu2:voltage-5Vsb       critical  Sensor value is within normal parameters.                         2024-10-25T18:40:07.574891359+00:00  cleared
⏸  velia-hardware-g2.service  critical  systemd unit state: (activating, auto-restart-queued)             2024-10-25T18:31:11.313098796+00:00  cleared
⏷  ne:psu1:fan:fan1:rpm       critical  Sensor value is within normal parameters.                         2024-10-25T18:33:38.795021892+00:00  cleared
⏷  ne:psu1:voltage-12V        critical  Sensor value is within normal parameters.                         2024-10-25T18:33:39.252338422+00:00  cleared
⏷  ne:psu1:voltage-5Vsb       critical  Sensor value is within normal parameters.                         2024-10-25T18:06:10.665378921+00:00  cleared
⏷  ne:psu1:voltage-in         critical  Sensor value is within normal parameters.                         2024-10-25T18:33:40.079460758+00:00  cleared
⏶  ne:psu1:voltage-12V        critical  Sensor value crossed high threshold (54016000 > 12700).           2024-10-25T18:33:42.643664519+00:00  active
⏶  ne:psu1:voltage-5Vsb       critical  Sensor value crossed high threshold (53335000 > 5400).            2024-10-25T18:33:39.683553815+00:00  active
⏷  ne:psu2:voltage-12V        critical  Sensor value is within normal parameters.                         2024-10-25T18:30:18.603725500+00:00  cleared
⏶  ne:psu2:voltage-12V        critical  Sensor value crossed high threshold (6088000 > 12700).            2024-10-25T18:33:10.164255573+00:00  active
✕  ne:psu1:current-12V        warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:58.547664607+00:00  active
✕  ne:psu1:current-5Vsb       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:58.920542970+00:00  active
✕  ne:psu1:current-in         warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:59.492480886+00:00  active
✕  ne:psu1:fan:fan1:rpm       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:59.866928297+00:00  active
✕  ne:psu1:power-in           warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:00.238519437+00:00  active
✕  ne:psu1:power-out          warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:00.604552159+00:00  active
✕  ne:psu1:temperature-1      warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:01.000750119+00:00  active
✕  ne:psu1:temperature-2      warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:01.406271147+00:00  active
✕  ne:psu1:voltage-12V        warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:01.785394821+00:00  active
✕  ne:psu1:voltage-5Vsb       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:02.729246139+00:00  active
✕  ne:psu1:voltage-in         warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:03.512106235+00:00  active
✕  ne:pdu:current-12V         warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:53.891240933+00:00  active
✕  ne:pdu:current-3V3         warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:54.373954658+00:00  active
✕  ne:pdu:current-5V          warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:54.730043804+00:00  active
✕  ne:pdu:power-12V           warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:55.092140125+00:00  active
✕  ne:pdu:power-3V3           warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:55.454127617+00:00  active
✕  ne:pdu:power-5V            warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:55.817969930+00:00  active
✕  ne:pdu:temperature-1       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:56.278002208+00:00  active
✕  ne:pdu:temperature-2       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:56.648869094+00:00  active
✕  ne:pdu:temperature-3       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:57.022780837+00:00  active
✕  ne:pdu:voltage-12V         warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:57.420004561+00:00  active
✕  ne:pdu:voltage-3V3         warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:57.813534372+00:00  active
✕  ne:pdu:voltage-5V          warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:30:58.178848935+00:00  active
✕  ne:psu2:current-12V        warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:03.903831816+00:00  active
✕  ne:psu2:current-5Vsb       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:04.421695854+00:00  active
✕  ne:psu2:current-in         warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:04.794927550+00:00  active
✕  ne:psu2:fan:fan1:rpm       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:05.193233782+00:00  active
✕  ne:psu2:power-in           warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:05.629463755+00:00  active
✕  ne:psu2:power-out          warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:06.008191930+00:00  active
✕  ne:psu2:temperature-1      warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:06.420164687+00:00  active
✕  ne:psu2:temperature-2      warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:06.836165163+00:00  active
✕  ne:psu2:voltage-12V        warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:07.242971743+00:00  active
✕  ne:psu2:voltage-5Vsb       warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:07.661656322+00:00  active
✕  ne:psu2:voltage-in         warning   Sensor value not reported. Maybe the sensor was unplugged?        2024-10-25T18:31:08.075445802+00:00  active

I think that the easiest fix is to send in all "sideloaded alarms" as cleared when starting up (unless they are, of course, detected to be active during the first sweep of the loop). Right now I think the clearing happens when a PSU gets replugged (or something like that)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions