-
Notifications
You must be signed in to change notification settings - Fork 2
Hearbeat
The heartbeat is the part of the state machine.
The state machine collect changes in status to propage changing state events in the local network and in the cloud.
One can watch at the last global hearbeat with the command:
{"level":"instances","command":{"action":"heartbeat","group":"all","type":"all"}}'
or
bin/clmgr instances heartbeat
A list of actions is put in a queue and execute step after step .
Starting a database service required
- A launched instance at the given ip
- A started instance at the given ip
- A ssh service answering in the instance
- A bootstap of the binaries in the instance
- A running ScambleDB in the instance
- An install of the database files
- A replication setup and running
- Reconfiguration of the proxies
In the horizon of local network we compute status based on information inside the configuration file:
For instances
- ssh ping
- gearman ping
For services
- connect and test to direct IP
- connect and test to VIP
Until all test pass the status is not reported running for the instance
The same cloud status is extended with amazon or vcloud API on your private office instance. This enabled translation from ip to EC2 instance names and provider external state of an instance.
Warning : The number of call to the api is invoiced we could later store in a cache such API status
{ "command" : {
"action" : "ping",
"group" : "all",
"type" : "db"
},
"host" : { "interfaces" : [ { "eth0" : {
"IP" : "10.0.0.102",
"STATE" : "UP"
},
"lo" : {
"IP" : "127.0.0.1",
"STATE" : "UP"
}
} ],
"ram" : "0"
},
"instances_status" : { "instances" : [ { "i-830dcde4" : {
"id" : "i-830dcde4",
"ip" : null,
"state" : "stopped"
} },
{ "i-7fba6203" : {
"id" : "i-7fba6203",
"ip" : "10.0.0.209",
"state" : "stopped"
} },
{ "i-987388e6" : {
"id" : "i-987388e6",
"ip" : "10.0.0.102",
"state" : "running"
} },
{ "i-4436ed3a" : {
"id" : "i-4436ed3a",
"ip" : "10.0.0.48",
"state" : "running"
} },
{ "i-4236ed3c" : {
"id" : "i-4236ed3c",
"ip" : "10.0.0.47",
"state" : "running"
} }
],
"return" : { "cloud" : {
"driver" : "EC2",
"elastic_ip" : "107.21.41.133",
"ex_vdc" : "na",
"host" : "na",
"instance_type" : "t1.micro",
"key" : "SDS145000",
"password" : "xxx",
"public_key" : "SDS145000.pem",
"region" : "us-east",
"security_groups" : "secure-group-vpc",
"status" : "master",
"subnet" : "subnet-49326222",
"template" : "ami-cb23a0a2",
"user" : "AKIAJR7YEOZPXASJCYDQ",
"version" : "1.5",
"vpc" : "vpc-7032621b",
"zone" : "us-east-1b"
}, "command" : {
"action" : "status",
"group" : "all",
"type" : "all"
}
}
},
"level" : "services",
"services_status" : { "services" : [ { "node10" : {
"code" : "000000",
"ip" : "10.0.0.102",
"mode" : "mariadb",
"name" : "node10",
"state" : "running",
"status" : "master",
"time" : "Thu Dec 20 16:10:13 2012"
} },
{ "node11" : {"code" : "ER0003",
"ip" : "10.0.0.47",
"mode" : "mariadb",
"name" : "node11",
"state" : "Database communication failure",
"status" : "slave",
"time" : "Thu Dec 20 16:10:13 2012"
} },
{ "node12" : {
"code" : "000000",
"ip" : "10.0.0.48",
"mode" : "mariadb",
"name" : "node12",
"state" : "running",
"status" : "slave",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "nosql1" : {
"code" : "000000",
"ip" : "10.0.0.102",
"mode" : "memcache",
"name" : "nosql1",
"state" : "running",
"status" : "master",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "nosql2" : {
"code" : "000000",
"ip" : "10.0.0.48",
"mode" : "memcache",
"name" : "nosql2",
"state" : "running",
"status" : "slave",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "proxy1" : {"code" : "000000",
"ip" : "10.0.0.102",
"mode" : "mysql-proxy",
"name" : "proxy1",
"state" : "running",
"status" : "na",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "proxy2" : {
"code" : "ER0003",
"ip" : "10.0.0.47",
"mode" : "mysql-proxy",
"name" : "proxy2",
"state" : "Database communication failure",
"status" : "na",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "proxy3" : {
"code" : "ER0003",
"ip" : "10.0.0.48",
"mode" : "mysql-proxy",
"name" : "proxy3",
"state" : "Database communication failure",
"status" : "na",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "lb1" : {
"code" : "ER0003",
"ip" : "10.0.0.102",
"mode" : "keepalived",
"name" : "lb1",
"state" : "Database communication failure",
"status" : "master",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "lb2" : {
"code" : "ER0003",
"ip" : "10.0.0.47",
"mode" : "keepalived",
"name" : "lb2",
"state" : "Database communication failure",
"status" : "slave",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "lb3" : {
"code" : "000000",
"ip" : "10.0.0.102",
"mode" : "haproxy",
"name" : "lb3",
"state" : "running",
"status" : "master",
"time" : "Thu Dec 20 16:10:14 2012"
} },
{ "lb4" : {
"code" : "000000",
"ip" : "10.0.0.47",
"mode" : "haproxy",
"name" : "lb4",
"state" : "On",
"status" : "master",
"time" : "Thu Dec 20 16:10:14 2012"
} }
] }
}
Delayed action are placed into memcache when the status of an ip is not running and we need to take actions on a service running in that instance
One can watch at the current actions with the command:
{"level":"instances","command":{"action":"actions","group":"all","type":"all"}}'
or
bin/clmgr instances actions
{"actions":[{
"event_ip":"10.0.0.102",
"event_type":"instances",
"do_action":"bootstrap_ncc",
"do_group":"node10",
"do_level":"services",
"event_state":"running"
},{
"event_ip":"10.0.0.102",
"event_type":"instances",
"do_action":"start",
"do_group":"node10",
"do_level":"services",
"event_state":"running"
}]}
The type of status to be monitored to trigger an action in the local network is defined with parametres
"event_ip":"10.0.0.102",
"event_type":"instances",
"event_state":"running"
Special events are placed by the cluster to request cloud action
"event_type":"cloud",
"do_group":"X.X.X.X",
In this case do_group will store the ip to place a cloud command Like in ./clmgr instances start X.X.X.X
The action to perform :
"do_action":"start",
"do_group":"node10",
"do_level":"services",
Status differences are compute on heartbeat and send to the cluster doctor worker scripts
Status differences are compute on your private office and send to the cloud doctor worker scripts
The type of local network messages send to the cluster doctor worker is define like this :
{"events":[{
"ip":"10.0.0.102",
"name":"i-987388e6",
"type":"instances",
"previous_state":"pending",
"state":"running"
"previous_code":0,
"code":"0",
},{
"ip":"10.0.0.49",
"name":"i-eb2eb69a",
"type":"instances",
"state":"pending
"previous_state":"stopped",
"previous_code":0,
"code":"0",
"}]}