Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
80c83b2
fix: Kubernetes event exporter
rohanarora Nov 11, 2025
ff63de5
bump: enabling ClickHouse to receive Prometheus metrics
rohanarora Nov 11, 2025
9587f6b
feat: adding the ability to create snapshot of K8s objects and pushin…
rohanarora Nov 12, 2025
fa7808e
bump: initial pass
rohanarora Nov 12, 2025
6053bfe
feat: leveraging the existing recorder roles as opposed to creating a…
rohanarora Nov 14, 2025
682ab2f
fix: argument validation failure
rohanarora Nov 14, 2025
5f8c2a2
bump: setting it up as a K8s job
rohanarora Nov 14, 2025
3c7754d
fix: missing ClickHouse credentials
rohanarora Nov 14, 2025
be4216b
fix: missing ClickHouse credentials - 2
rohanarora Nov 14, 2025
f950e36
fix: environment variables for credentials
rohanarora Nov 14, 2025
a8538ba
bump: addressing Could not find a version that satisfies the requirem…
rohanarora Nov 14, 2025
f3a5231
bump:setting ClickHouse endpoint leveraging the environment variable
rohanarora Nov 14, 2025
9602031
fix: username instead of user
rohanarora Nov 14, 2025
d1d86fd
bump: can skip the port number for ClickHouse
rohanarora Nov 14, 2025
55108b6
fix: remove protocol from clickhouse hostname
rohanarora Nov 14, 2025
59abe3c
bump: correcting labels
rohanarora Nov 14, 2025
84f361d
bump: resource bump for job
rohanarora Nov 14, 2025
4a08c28
bump: consolidating install-uninstall for ClickHouse exporter
rohanarora Nov 14, 2025
9de101a
bump: dedicated ClickHouse-based collection node
rohanarora Nov 14, 2025
de25bbc
bump: resource bump for job - 2
rohanarora Nov 15, 2025
c62f222
fix: label
rohanarora Nov 15, 2025
72f0558
bump: further resource bump for job - 2
rohanarora Nov 15, 2025
1a1af29
bump: allowing for more time for the job to complete
rohanarora Nov 15, 2025
7bea403
bump: further resource bump for job - 3
rohanarora Nov 15, 2025
c433b55
fix: directory path / location for data collected from ClickHouse
rohanarora Nov 15, 2025
cf6a994
bump: adding missing recursive argument
rohanarora Nov 15, 2025
b888349
bump: ClickHouse resource limits
rohanarora Nov 15, 2025
57804fb
bump: Prometheus namespace deletion taking longer than expected
rohanarora Nov 15, 2025
22211ee
bump: enabling logs and traces
rohanarora Nov 15, 2025
53670e0
fix: correction from recursive to recurse
rohanarora Nov 15, 2025
0e1b75f
bump: separating directories for service and pod metrics
rohanarora Nov 15, 2025
360e606
bump: adding metrics table; fixing looping over all tables in a database
rohanarora Nov 15, 2025
1143c6a
fix: database name with metrics data
rohanarora Nov 15, 2025
843e475
fix: tackling inner tables for prometheus
rohanarora Nov 15, 2025
a19f177
fix: tackling inner tables for prometheus - 2
rohanarora Nov 15, 2025
be437b5
fix: correct database name with metrics data
rohanarora Nov 15, 2025
5d99296
bump: maintaining directory structure on S2 upload
rohanarora Nov 15, 2025
8f2fb56
fix: correcting Makefile command
rohanarora Nov 18, 2025
6cdb947
fix: correcting phase value
rohanarora Nov 18, 2025
64e5748
bump: elastic IP correction
rohanarora Nov 18, 2025
5768194
bump: wrapping up exporter from the SRE lens
rohanarora Nov 18, 2025
aac337c
bump: bumping up resource limits-requests for flagd and recommendatio…
rohanarora Nov 18, 2025
61cc1bc
bump: adding filters for logs and traces; defaulting to error and war…
rohanarora Nov 18, 2025
43b0be0
bump: chart version bump consistent with main
rohanarora Nov 18, 2025
897a73e
bump: adding pause to allow for the application to stabilize post ins…
rohanarora Nov 19, 2025
7b2c7cd
bump: adding default set of namespaces for k8s objects
rohanarora Nov 19, 2025
48ebd27
bump: accounting service resource bump
rohanarora Nov 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sre/Makefile.runner
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ launch_sync_workflow: ## Launches the scenario sync workflow
.PHONY: launch_start_workflow
launch_start_workflow: ## Launches the workflow equivalent of start_incident on AWX
ansible-playbook -i inventory.yaml playbooks/manage_awx.yaml --tags "launch_workflows" \
--extra-vars "run_phase=init"
--extra-vars "run_phase=start"

.PHONY: launch_stop_workflow
launch_stop_workflow: ## Launches the workflow equivalent of stop_incident on AWX
Expand Down
2 changes: 1 addition & 1 deletion sre/dev/local_cluster/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ delete_cluster: ## DEPRECATED: Deletes a Kind cluster
@echo "This command will be removed in a future version."
@echo "Executing 'make destory_cluster'..."
@echo ""
$(MAKE) destory_cluster
$(MAKE) destroy_cluster
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@
- --cloud
- aws
- --topology
- "{{ 'private' if kops_stack.runners.aws.elastic_ip_allocation_id is defined else 'public' }}"
- "{{ 'private' if kops_elastic_ip_available else 'public' }}"
- --network-id
- "{{ kops_vpc_info.vpc.id }}"
- --subnets
Expand Down
27 changes: 20 additions & 7 deletions sre/dev/remote_cluster/roles/kops/tasks/validate_stack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,23 @@
success_msg: Valid number of kOps clusters configured.

- name: Validate Elastic IP allocation ID
ansible.builtin.assert:
that:
- kops_stack.runners.aws.elastic_ip_allocation_id | length > 0
fail_msg: Invalid number of kOps clusters set. Must be greater than 0.
success_msg: Valid number of kOps clusters configured.
when:
- kops_stack.runners.aws.elastic_ip_allocation_id is defined
block:
- name: Check Elastic IP allocation ID
ansible.builtin.assert:
that:
- kops_stack.runners.aws.elastic_ip_allocation_id is defined
- kops_stack.runners.aws.elastic_ip_allocation_id | length > 0
fail_msg: Invalid or missing Elastic IP allocation ID.
success_msg: Valid Elastic IP allocation ID configured.

- name: Set validation flag for valid Elastic IP
ansible.builtin.set_fact:
kops_elastic_ip_available: true
rescue:
- name: Warning about invalid Elastic IP
ansible.builtin.debug:
msg: "WARNING: {{ ansible_failed_result.msg | default('Invalid or missing Elastic IP allocation ID') }}"

- name: Set validation flag for invalid Elastic IP
ansible.builtin.set_fact:
kops_elastic_ip_available: false
Loading