Skip to content

hwmon/pmbus: sensor fluctuations and out-of-range values of power supplies #13

@jktjkt

Description

@jktjkt
sdn-bidi-cplus1572-PGCL250304 ~ # velia-list-alarms 
   Resource              Severity  Detail                                                    Timestamp                            Status
⏸   velia-system.service  critical  systemd unit state: (failed, failed-before-auto-restart)  2025-05-23T08:51:14.332132841+00:00  cleared
                         cleared   systemd unit state: (activating, auto-restart-queued)     2025-05-23T08:51:24.471437497+00:00  
                         critical  systemd unit state: (activating, auto-restart)            2025-05-23T08:51:14.388608352+00:00  
                         cleared   systemd unit state: (activating, auto-restart-queued)     2025-05-23T08:51:06.568808888+00:00  
                         critical  systemd unit state: (activating, auto-restart)            2025-05-23T08:51:06.516304191+00:00  
⏷   ne:psu1:voltage-in    cleared   Sensor value is within normal parameters.                 2025-05-23T13:56:27.311500830+00:00  cleared
                         cleared   Sensor value is within normal parameters.                 2025-05-23T14:02:10.625083914+00:00  
                         critical  Sensor value crossed low threshold (2000 < 90000).        2025-05-23T14:02:08.654824534+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T14:01:07.550300342+00:00  
                         critical  Sensor value crossed low threshold (2000 < 90000).        2025-05-23T14:01:05.578179118+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T14:01:01.680226415+00:00  
                         critical  Sensor value crossed low threshold (3250 < 90000).        2025-05-23T14:00:59.680230614+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T14:00:07.948797111+00:00  
                         critical  Sensor value crossed low threshold (54500 < 90000).       2025-05-23T14:00:05.958319641+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:59:08.588271401+00:00  
                         critical  Sensor value crossed low threshold (3250 < 90000).        2025-05-23T13:59:06.630017712+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:58:30.200526699+00:00  
                         critical  Sensor value crossed low threshold (10750 < 90000).       2025-05-23T13:58:28.223341590+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:58:24.343886480+00:00  
                         critical  Sensor value crossed low threshold (2000 < 90000).        2025-05-23T13:58:22.366443879+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:57:55.551575914+00:00  
                         critical  Sensor value crossed low threshold (3250 < 90000).        2025-05-23T13:57:53.557831863+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:57:05.719047437+00:00  
                         critical  Sensor value crossed low threshold (3250 < 90000).        2025-05-23T13:57:03.673390741+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:56:50.272905417+00:00  
                         critical  Sensor value crossed low threshold (10750 < 90000).       2025-05-23T13:56:46.410945779+00:00  
                         critical  Sensor value crossed low threshold (2000 < 90000).        2025-05-23T13:56:25.326097069+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:56:08.068691819+00:00  
                         critical  Sensor value crossed low threshold (2000 < 90000).        2025-05-23T13:56:06.086121811+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:56:02.225939945+00:00  
                         critical  Sensor value crossed low threshold (10750 < 90000).       2025-05-23T13:56:00.243734669+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:55:25.794328130+00:00  
                         critical  Sensor value crossed low threshold (3250 < 90000).        2025-05-23T13:55:23.819317371+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:54:45.559829430+00:00  
                         critical  Sensor value crossed low threshold (10750 < 90000).       2025-05-23T13:54:43.572751305+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:53:48.087960818+00:00  
                         critical  Sensor value crossed low threshold (10750 < 90000).       2025-05-23T13:53:46.113620358+00:00  
⏶   ne:psu1:voltage-in    cleared   Sensor value is within normal parameters.                 2025-05-23T12:56:10.453958495+00:00  cleared
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:57:07.692066774+00:00  
                         critical  Sensor value crossed high threshold (316000 > 264000).    2025-05-23T13:57:05.648598148+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:51:43.292553626+00:00  
                         critical  Sensor value crossed high threshold (287000 > 264000).    2025-05-23T13:51:41.323585970+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:36:03.336330463+00:00  
                         critical  Sensor value crossed high threshold (418000 > 264000).    2025-05-23T13:36:01.344319884+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:32:26.374672056+00:00  
                         critical  Sensor value crossed high threshold (331500 > 264000).    2025-05-23T13:32:24.404210739+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:17:27.408230685+00:00  
                         critical  Sensor value crossed high threshold (291500 > 264000).    2025-05-23T13:17:25.413994965+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:15:54.924287686+00:00  
                         critical  Sensor value crossed high threshold (312000 > 264000).    2025-05-23T13:15:52.817587156+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:14:35.836326019+00:00  
                         critical  Sensor value crossed high threshold (281500 > 264000).    2025-05-23T13:14:33.866230071+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:14:28.042457019+00:00  
                         critical  Sensor value crossed high threshold (294000 > 264000).    2025-05-23T13:14:26.063232643+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:12:42.866008225+00:00  
                         critical  Sensor value crossed high threshold (407500 > 264000).    2025-05-23T13:12:40.893074783+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T13:11:45.260004902+00:00  
                         critical  Sensor value crossed high threshold (286500 > 264000).    2025-05-23T13:11:43.276660506+00:00  
                         critical  Sensor value crossed high threshold (311000 > 264000).    2025-05-23T12:56:08.480110993+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T12:55:32.012536288+00:00  
                         critical  Sensor value crossed high threshold (296500 > 264000).    2025-05-23T12:55:30.031339891+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T12:53:16.358630216+00:00  
                         critical  Sensor value crossed high threshold (379000 > 264000).    2025-05-23T12:53:14.379680036+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T12:51:05.586671193+00:00  
                         critical  Sensor value crossed high threshold (282500 > 264000).    2025-05-23T12:51:03.612994058+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T12:43:51.255187811+00:00  
                         critical  Sensor value crossed high threshold (312000 > 264000).    2025-05-23T12:43:49.273794267+00:00  
                         cleared   Sensor value is within normal parameters.                 2025-05-23T12:40:56.780322108+00:00  
                         critical  Sensor value crossed high threshold (332500 > 264000).    2025-05-23T12:40:54.800962087+00:00  

This is on the most recent HW:

sdn-bidi-cplus1572-PGCL250304 ~ # velia-list-hardware 
HW                       Manufacturer  Model                                           Version                    S/N
ne                       PEI-Genesis   sdn-bidi-cplus1572-g2 (PG-CL-SDN_dualBiDi-C-L)  1                          PGCL250304
ne:ctrl                                                                                                           0910C30854100840403BA080A08000AA
ne:ctrl:carrier          SolidRun      Clearfog Base (SRCFCBE000CV14)                  1.4                        IP01195231800006
ne:ctrl:carrier:console                                                                                           DQ00G4UC
ne:ctrl:carrier:eeprom                                                                                            294100B1385A
ne:ctrl:emmc                           8GTF4R                                                                     0x35c55f91
ne:ctrl:som              SolidRun      A38x SOM (SRM6828S32D01GE008V21C0)              2.1                        IP01195231800006
ne:ctrl:som:eeprom                                                                                                80342872AC07
ne:edfa-c-band                         FIXED GAIN                                      1.1.0                      346733
ne:edfa-narrow-1572                    FIXED GAIN                                      1.1.0                      346691
ne:fans                                                                                                           0910C30854100842000DA080A08000A5
ne:pdu                   3Y POWER      YH-5151E (URP1X151AH)                           B01R P2J700A01 A20         SA200T302229001114
ne:psu1                  3Y POWER      YM-2151E (URP1X151AM)                           B01R        P2J700A04 A17  SA170T292229004149 
ne:psu2                  3Y POWER      YM-2151E (URP1X151AM)                           B01R        P2J700A04 A17  SA170T292229004150

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions