-
Notifications
You must be signed in to change notification settings - Fork 2
Hearbeat
The heartbeat is the part of the state machine.
The state machine collect changes in status to keep track of events in the local network and in the cloud.
One can watch at the last global hearbeat with the command:
{"level":"instances","command":{"action":"heartbeat","group":"all","type":"all"}}'
or
bin/clmgr instances heartbeat
A list of actions is put in a queue and execute step after step .
Starting a database service required
- A launched instance at the given ip
- A started instance at the given ip
- A ssh service answering in the instance
- A bootstap of the binaries in the instance
- A running ScambleDB in the instance
- An install of the database files
- A replication setup and running
- Reconfiguration of the proxies
In the horizon of local cloud we simulate status parsing the configuration file and doing ssh test : step 1 and 2 may be ignored
The same cloud status is extended with amazon or vcloud API but as the number of call to the api is invoiced we could later store in a cache such API status enabling translation from ip to EC2 instance names and provider external state of an instance.
{ "command" : {
"action" : "ping",
"group" : "all",
"type" : "db"
},
"host" : { "interfaces" : [ { "eth0" : {
"IP" : "10.0.0.102",
"STATE" : "UP"
},
"lo" : {
"IP" : "127.0.0.1",
"STATE" : "UP"
}
} ],
"ram" : "0"
},
"instances_status" : { "instances" : [ { "i-830dcde4" : {
"id" : "i-830dcde4",
"ip" : null,
"state" : "stopped"
} },
{ "i-7fba6203" : {
"id" : "i-7fba6203",
"ip" : "10.0.0.209",
"state" : "stopped"
} },
{ "i-987388e6" : {
"id" : "i-987388e6",
"ip" : "10.0.0.102",
"state" : "running"
} },
{ "i-4436ed3a" : {
"id" : "i-4436ed3a",
"ip" : "10.0.0.48",
"state" : "running"
} },
{ "i-4236ed3c" : {
"id" : "i-4236ed3c",
"ip" : "10.0.0.47",
"state" : "running"
} }
],
"return" : { "cloud" : {
"driver" : "EC2",
"elastic_ip" : "107.21.41.133",
"ex_vdc" : "na",
"host" : "na",
"instance_type" : "t1.micro",
"key" : "SDS145000",
"password" : "xxx",
"public_key" : "SDS145000.pem",
"region" : "us-east",
"security_groups" : "secure-group-vpc",
"status" : "master",
"subnet" : "subnet-49326222",
"template" : "ami-cb23a0a2",
"user" : "AKIAJR7YEOZPXASJCYDQ",
"version" : "1.5",
"vpc" : "vpc-7032621b",
"zone" : "us-east-1b"
}, "command" : {
"action" : "status",
"group" : "all",
"type" : "all"
}
}
},
"level" : "services",
"services_status" : { "services" : [ { "node10" : {
"code" : "000000",
"ip" : "10.0.0.102",
"mode" : "mariadb",
"name" : "node10",
"state" : "running",
"status" : "master",
"time" : "Thu Dec 20 16:10:13 2012"
} },
{ "node11" : {"code" : "ER0003",
"ip" : "10.0.0.47",
"mode" : "mariadb",
"name" : "node11",
"state" : "Database communication failure",
"status" : "slave",
"time" : "Thu Dec 20 16:10:13 2012"
} },
{ "node12" : {
"code" : "000000",
"ip" : "10.0.0.48",
"mode" : "mariadb",
"name" : "node12",
"state" : "running",
"status" : "slave",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "nosql1" : {
"code" : "000000",
"ip" : "10.0.0.102",
"mode" : "memcache",
"name" : "nosql1",
"state" : "running",
"status" : "master",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "nosql2" : {
"code" : "000000",
"ip" : "10.0.0.48",
"mode" : "memcache",
"name" : "nosql2",
"state" : "running",
"status" : "slave",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "proxy1" : {"code" : "000000",
"ip" : "10.0.0.102",
"mode" : "mysql-proxy",
"name" : "proxy1",
"state" : "running",
"status" : "na",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "proxy2" : {
"code" : "ER0003",
"ip" : "10.0.0.47",
"mode" : "mysql-proxy",
"name" : "proxy2",
"state" : "Database communication failure",
"status" : "na",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "proxy3" : {
"code" : "ER0003",
"ip" : "10.0.0.48",
"mode" : "mysql-proxy",
"name" : "proxy3",
"state" : "Database communication failure",
"status" : "na",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "lb1" : {
"code" : "ER0003",
"ip" : "10.0.0.102",
"mode" : "keepalived",
"name" : "lb1",
"state" : "Database communication failure",
"status" : "master",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "lb2" : {
"code" : "ER0003",
"ip" : "10.0.0.47",
"mode" : "keepalived",
"name" : "lb2",
"state" : "Database communication failure",
"status" : "slave",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "lb3" : {
"code" : "000000",
"ip" : "10.0.0.102",
"mode" : "haproxy",
"name" : "lb3",
"state" : "running",
"status" : "master",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "lb4" : {
"code" : "000000",
"ip" : "10.0.0.47",
"mode" : "haproxy",
"name" : "lb4",
"state" : "On",
"status" : "master",
"time" : "Thu Dec 20 16:10:14 2012"
} }
] }
}
Delayed action are placed into memcache when the status of an ip is not running and we need to take actions on a service running in that instance
{"actions":[{
"event_ip":"10.0.0.102",
"event_type":"instances",
"do_action":"bootstrap_ncc",
"do_group":"node10",
"do_level":"services",
"event_state":"running"
},{
"event_ip":"10.0.0.102",
"event_type":"instances",
"do_action":"start",
"do_group":"node10",
"do_level":"services",
"event_state":"running"
}]}
What type of status should be monitored to trigger the action
"event_ip":"10.0.0.102",
"event_type":"instances",
"event_state":"running"
The action to perform :
"do_action":"start",
"do_group":"node10",
"do_level":"services",
Status differences are compute on heartbeat and send to the doctor worker scripts
The type of messages send to the worker is define like this :
{"events":[{
"ip":"10.0.0.102",
"name":"i-987388e6",
"type":"instances",
"previous_state":"pending",
"state":"running"
"previous_code":0,
"code":"0",
},{
"ip":"10.0.0.49",
"name":"i-eb2eb69a",
"type":"instances",
"state":"pending
"previous_state":"stopped",
"previous_code":0,
"code":"0",
"}]}