-
Notifications
You must be signed in to change notification settings - Fork 4
Kube Deployment Documentation
Zhenbo edited this page Dec 22, 2022
·
4 revisions
- The
*.yml
files in theconfig_flow
directory handles configuration for both cloud and site stacks.ifName
andvlan
together under Hosts and switchData are the unique identifier of each flow. (currently onlyifName
is the identifier). -
communityString
underswitchData
is unique to each network element and should be kept as a secret not shown on Github. - Under
Cloud
Stack, there'refill*.py
files that read in the configuration files and dynamically writes install and start scripts. This is hidden from the users understart.sh
andinstall.sh
. - Under
Site
Stack, thefill*.py
files read in environment variables passed in fromdocker-compose.yml
files sinceSite
Stack is completely containerized. -
config.yml
is an example config file for single switch system.multiconfig.yml
is an example config file for double switches system. (Note double switch network has never been tested, we might run into problem when implementing it on a network with more than 1 switch).
Site
stack is completely containerized so users do not need to install anything beside the docker image. This sessions explains how everything works inside each container
-
Site
installs SNMP exporter from Github. SNMP has many dependencies when generatingsnmp.yml
file, and all dependencies are installed inside the container.
-
./run.sh
kick starts the./dynamic_start.sh
. - It dynamically generates a push script file for each exporter through and pushes through
crontab
. - All exporters push metrics by curling the local port metrics to pushgateway server on the
Cloud
stack. For example, Node exporter runs on 9100. Script generated from install script will look like thiscurl -s ${MYIP}:9100/metrics | curl --data-binary @- $PUSHGATEWAY_SERVER/metrics/job/node-exporter/instance/$MYIP
. Instance indicates where the data coming from. This URL can be customized. - ARP and SNMP are more complicated with intermediate storage files see down below.
-
docker compose down -v
ordocker rm <container_id>
- Kube:
kubectl delete -n <name_space> deployment <exporter_name>
- Installs Grafana and Nginx. It allows user to encrypt Grafana to HTTPS using Nginx Reverse Proxy.
- Script Exporter is pull from Github.
- Pushgateway, Script Exporter, and Prometheus, Nginx and Grafana are all under cloud stack right now. To start just run ./start.sh under cloud directory.
- Generate a dashboard run:
./generate.sh
. A configuration file, based on user input, underconfig_flow
will be used to generate the dashboard. - Grafana ports 3000:3000 needed see https://github.com/esnet/sense-rtmon/issues/18#issue-1332438911
-
clean.sh
removes theCloud
stack and deletes the containers. -
CAUTION Every time
Grafana
container is removed or updated the dashboards are lost. Login passwords, API keys, and datasource need to be reset. A stabler way to run Grafana is through systemd.
-
- Running as a container that enables HTTPS. Certificates and DNS from the host are required.
- In the
docker-stack.yml
file Nginx needs to match the ports of other applications and ports access from. E.g. access 443 to 3000. If we want pushgateway and promethues to be HTTPS we need to open two additional ports (I might be wrong there might be other work arounds, I tried using location / but CSS didn't apply to pushgateway). - Remove HTTPS auto redirecting on Chrome:
chrome://net-internals/#hsts
- Many browser auto reroute to HTTPS. If we have ports that are still on HTTP it's hard to access due to the redirecting. Go to the website and find
Delete domain security policies
to remove auto direct. - Inside the container
/etc/nginx/conf.d/
is where configuration files are stored.
-
- Script Exporter enables layer2 debugging. Under
examples
directory, theconfig.yaml
tells the script exporter which script to run.args.sh
andmultiDef.sh
are used for single and double switches. Anything more than 2 switches are not implemented yet. - These files are configed by
fill_config.py
date is from configuration files. -
*.sh
files sendecho
andPromethues
database stores the data. The dashboard is looking for what is sent. Every changes made here need to be made in the Layer 2 dashboard templates as well. - e.g.
echo "host1_arp_on{host=\"${host1}\"} 1"
host1_arp_on is stored in prometheus, 1 represents on and 0 is off. - The format is string followed by a number. If a string is included Prometheus database can't take the data in the whole script will fail and no data goes through.
- Script Exporter enables layer2 debugging. Under
-
-
dashboard
directory hasdynamic.py
that generates two dashboards. One contains SNMP and Node exporter data and the other contains SNMP and ARP exporter data. The diagram and Prometheus queries are dynamically built based on the config file given. - If SNMP exporter is not running on either hosts,
dynamic.py
will fail to generate any dashboard. -
dynamic.py
runscurl
to find the correspondingifIndex
based theifName
given in the config file. TheifIndex
is used the queries. -
fill_API.py
curls the API key AUTOMATICALLY and it's included in thegenerate.sh
script. Please curl API keys before changing admin and password of Grafana. -
generate.sh
reads the configuration files from users fromconfig_flow
folder and generates a dashboard accordingly. It includes an auto curl process that creates an API authentication key and stores it in the configuration file.
-
-
-
prometheus.yml
stores the configuration for Prometheus. - It targets port 9090 pushgateway, 9091 Prometheus (itself), 9496 Scrape Exporter for both single and double switch (multi/default).
-
prometheus.yml
is updated by config file when installed and started. The only moving part is the IP address. To add more script exporter scripts follow the current syntax.
-
-
- SNMP access the switch and MIB though this line:
localhost:9116/snmp?target=<switch_ip_address>&module=<module_names_e.g.: if_mib>
- Curl stores the result of the query in an intermediate file then curl the content to pushgateway.
- Downloading MIBS refer to: https://github.com/esnet/sense-rtmon/issues/17#issue-1330372320
- Container reads the
PRIVATE_MIB
fromdocker-compose.yml
and include the private mibs under that folder.
-
- ARP is more complicated for it needs to be able to detect changes in ARP table (
arp -a
). - ARP files are located under
/home
of each container. - Important files:
-
arp_out.json
stores the output ofarp -a
of the host system in json format. The plain output is converted to json by convertARP.py. -
prev.json
stores the previousarp -a
output. -
delete.json
stores all current URLs on pushgateway in the format that can be processed to erase pushgateway data directly.
-
- Put together. aroOut.json is updated every 15s. If there is discrepancy between the it and prev.json, ARP container deletes all current URLs from delete.json files and push new URLs from
arp_out.json
. -
ping_status
andprev_ping_status
work in a similar fashion. The host pings the other host and stores the result and send it to pushgateway. If the two files are different, delete everything on pushgateway and resend the new URLs and ping status.
- ARP is more complicated for it needs to be able to detect changes in ARP table (
-
- Currently not functional and in development. It's similar to ARP and can send data to pushgateway with easy fixes in
overwrite_json_exporter_tcp.py
(take around a day). However, the design might need to be changed.
- Currently not functional and in development. It's similar to ARP and can send data to pushgateway with easy fixes in
-
- mermaid.live is used to draw diagram. The website has a good live drawing board for instant feedbacks.
-
Future
: Local/Global Ports Unique Flow IDs
-
-
fill*.py
under bothCloud
andSite
filldynamic_*.sh
files based on the configuration file (cloud) or environment variables (stack). - It uses
re.sub
function replace lines in Docker Compose files and assign variables value insh
scripts. - Details on how
re.sub
works for future reference: https://stackoverflow.com/questions/20462834/python-using-str-replace-with-a-wildcard
-
-
- They store the functions that are used more than once in other
.py
files. - This practice makes the project more modular and avoiding repeated codes.
- Usage:
import site_functions
.
- They store the functions that are used more than once in other
-
-
crontab -e
shows all the cron jobs that are currently running. -
Node
,SNMP
, andARP
are set * * * * * which is run every minute, but with a loop (for 0 1 2 sleep 15) it runs every 15s.
-
For more questions please contact: Zhenbo [email protected] and Pratyush [email protected]
RealTime Flow Monitoring and Analysis