Skip to content

Cosmo SP "disappeared" #2157

@rmustacc

Description

@rmustacc

We had an issue where a Rev A Cosmo, BRM22250001, SP basically seemed to disappear after being up for some time. This was in a rack context so there was no way to get probes on it. Here were the observations that we made:

  • Fans had spun up to a constant, elevated level that was not commiserate with the I/O workload going on. This was based on the fact that we had a few other Cosmo systems in the same environment. Whether this was a fixed elevated rate or the MAX31790 is unknown.
  • The host OS remained up and operational during this time. There was on degradation in host performance for the network stress test that we were performing at that time.
  • The device did not reply to any IPCC activity. I issued a basic IPCC ident command.
  • The SP was not broadcasting any beacons on the management network.
  • We went a step further and confirmed on both switch ports that the link was up and that the counters for data transmitted by the SP did not increase at all.

Unfortunately due to being in a rack environment we don't have any additional information about what happened. This is here for tracking and to see if we see similar failure modes again.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions