Skip to content

What if a systems fails with panic/dead or partially dead? #64

@rjsuresh

Description

@rjsuresh

Since the ByNar is running as binary (agent) in the system, what happens on the following scenario?

  • Kernel panic
  • System rebooted, not up?
  • Someone stopped the agent and not restarted?
  • Partially died due to hardware (memory, cpu, raid...)

When system goes off then the agent goes off as the agent is running on the system which should be healthy to execute the monitoring.

Possible Solution:

  • Client/Server Architecture ?
  • Peer to Peer monitoring (ex. CEPH OSDs)?

Possible issue again on the solution:

  • Client / Server architecture needs administrative overhead, fail over, firewall, DR, certs, LB and redundancy....
  • Peer to Peer - Message broadcasting or streamlined/narrow down approach. Example, A failed system should be monitored only by the neighbors? A system before and after the sequence ?

Just throwing my thoughts so not miss. :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions