Skip to content

Add chart support for EKS Auto Mode (DO NOT MERGE) #1856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .chloggen/eksautomodedistro.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
# The name of the component, or a single word describing the area of concern, (e.g. agent, clusterReceiver, gateway, operator, chart, other)
component: other
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add support for EKS Auto Mode. (Part 1)
# One or more tracking issues related to the change
issues: [1856]
# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
100 changes: 100 additions & 0 deletions docs/advanced-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ scrape additional metadata. The supported options are:

- `aks` - Azure AKS
- `eks` - Amazon EKS
- `eks/auto-mode` - Amazon EKS Auto Mode
- `eks/fargate` - Amazon EKS with Fargate profiles
- `gke` - Google GKE / Standard mode
- `gke/autopilot` - Google GKE / Autopilot mode
Expand Down Expand Up @@ -298,6 +299,105 @@ for the Fargate distribution has two primary differences between regular `eks` t
node label. The Collector's ClusterRole for `eks/fargate` will allow the `patch` verb on `nodes` resources for the default API groups to allow the cluster
receiver's init container to add this node label for designated self monitoring.

## EKS Auto Mode

If you want to run the Splunk OpenTelemetry Collector in [Amazon EKS Auto Mode Cluster](https://docs.aws.amazon.com/eks/latest/userguide/automode.html),
make sure to set the required `distribution` value to `eks/auto-mode`:

```yaml
distribution: eks/auto-mode
```

`EKS Auto Mode` restricts access to IMDS (Instance Metadata Service) so that only pods running in the host network namespace can access it.
This causes the `ec2` and `eks` detectors in the `resourcedetection` processor to fail to collect attributes from the metadata server.

In addition to IMDS, we have introduced a new method to the [eks detector](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor#amazon-eks)
that extracts the required attributes using the Kubernetes API server and EC2 API.
However, this alternative method requires IAM authentication to permit EKS and EC2 API calls; example: Pod Identity.

This distribution will operate similarly to the `eks` distribution but with the following distinctions:

By Default and to reduce friction, the helm chart attempts to configure the cluster receiver and the agent to run in host network namespace.
This approach eliminates the need to configure `Pod Identity`, however, if user explicitly sets `agent.hostNetwork.enabled`
or `clusterReceiver.hostNetwork` to `false`, the chart will be installed with a warning and the `eks` detector in the `resourcedetection`
processor will fail unless `Pod Identity` is enabled and configured.

**Note**: If you are deploying OTEL as Gateway in the EKS Auto Mode cluster, it's required to enable and configure `Pod Identity`.

### Example Setting POD Identity on EKS Auto Mode

You can follow the [POD Identity documentation](https://docs.aws.amazon.com/eks/latest/userguide/pod-id-agent-setup.html) or use the following example.

The Amazon EKS Pod Identity Agent is included in EKS Auto Mode compute; you do not need to install the addon.

- Create IAM Policy with `EC2:DescribeInstances` permission :
```
export POLICY_ARN=$(aws iam create-policy \
--policy-name splunk-opentelemetry-collector-policy \
--policy-document file://splunk-opentelemetry-collector-policy.json \
--query 'Policy.Arn' --output text)
```

`splunk-opentelemetry-collector-policy.json`
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
],
"Resource": "*"
}
]
}
```

- Create Pod Identity Trust IAM Role :
```
export POD_ROLE_ARN=$(aws iam create-role --role-name splunk-opentelemetry-collector-pod-identity-role \
--assume-role-policy-document file://eks-pod-identity-trust-policy.json \
--output text --query 'Role.Arn')
```

`eks-pod-identity-trust-policy.json`
```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "pods.eks.amazonaws.com"
},
"Action": [
"sts:AssumeRole",
"sts:TagSession"
]
}
]
}
```

- Attach Policy to Role:
```
aws iam attach-role-policy --role-name splunk-opentelemetry-collector-pod-identity-role \
--policy-arn $POLICY_ARN
```

- Create the Pod Identity Association :

Make sure to set and export \$CLUSTER_NAME, \$NAMESPACE and \$SERVICEACCOUNTNAME with their appropriate values.
````
aws eks create-pod-identity-association \
--cluster-name $CLUSTER_NAME \
--namespace $NAMESPACE \
--service-account $SERVICEACCOUNTNAME \
--role-arn $POD_ROLE_ARN \
--region $AWS_REGION
````

## Control Plane metrics

By setting `agent.controlPlaneMetrics.{component}.enabled=true` the helm chart will set up the otel-collector agent to
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ data:
api_url: https://api.CHANGEME.signalfx.com
correlation: null
ingest_url: https://ingest.CHANGEME.signalfx.com
root_path: /hostfs
sync_host_metadata: true
splunk_hec/o11y:
disable_compression: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ data:
timeout: 10s
extensions:
health_check:
endpoint: 0.0.0.0:13133
endpoint: 0.0.0.0:13134
processors:
batch:
send_batch_max_size: 32768
Expand Down Expand Up @@ -92,7 +92,7 @@ data:
scrape_interval: 10s
static_configs:
- targets:
- localhost:8889
- localhost:8899
service:
extensions:
- health_check
Expand Down Expand Up @@ -126,7 +126,7 @@ data:
exporter:
prometheus:
host: localhost
port: 8889
port: 8899
without_scope_info: true
without_type_suffix: true
without_units: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ spec:
component: otel-collector-agent
release: default
annotations:
checksum/config: c6061d9d4b87136825f385d0ec46adf39b85e0e5dbfa5716e5eaf7e631f8931b
checksum/config: 46c305b925b879d582b6561a97834124d4ca283eff403743a852b61c6229eac0
kubectl.kubernetes.io/default-container: otel-collector
spec:
hostNetwork: true
Expand Down Expand Up @@ -126,12 +126,10 @@ spec:
key: splunk_observability_access_token

readinessProbe:
initialDelaySeconds: 0
httpGet:
path: /
port: 13133
livenessProbe:
initialDelaySeconds: 0
httpGet:
path: /
port: 13133
Expand Down Expand Up @@ -160,6 +158,9 @@ spec:
- mountPath: /hostfs/var/run/utmp
name: host-var-run-utmp
readOnly: true
- mountPath: /hostfs/usr/lib/os-release
name: host-usr-osrelease
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
Expand Down Expand Up @@ -204,6 +205,9 @@ spec:
- name: host-var-run-utmp
hostPath:
path: /var/run/utmp
- name: host-usr-osrelease
hostPath:
path: /usr/lib/os-release
- name: otel-configmap
configMap:
name: default-splunk-otel-collector-otel-agent
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ spec:
component: otel-k8s-cluster-receiver
release: default
annotations:
checksum/config: 342f48d47a4695f5d28a0e13e1ca32e37a7cbe5c2bb2b852e714de2c8de8ae8e
checksum/config: 11fa0330d0d16afc5baca06aa3f2d205c97864dac62af187a38134d71dcc1bad
spec:
serviceAccountName: default-splunk-otel-collector
nodeSelector:
Expand Down Expand Up @@ -72,15 +72,13 @@ spec:
name: default-splunk-otel-collector
key: splunk_observability_access_token
readinessProbe:
initialDelaySeconds: 0
httpGet:
path: /
port: 13133
port: 13134
livenessProbe:
initialDelaySeconds: 0
httpGet:
path: /
port: 13133
port: 13134
resources:
limits:
cpu: 200m
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ data:
api_url: https://api.CHANGEME.signalfx.com
correlation: null
ingest_url: https://ingest.CHANGEME.signalfx.com
root_path: /hostfs
sync_host_metadata: true
splunk_hec/o11y:
disable_compression: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ data:
timeout: 10s
extensions:
health_check:
endpoint: 0.0.0.0:13133
endpoint: 0.0.0.0:13134
processors:
batch:
send_batch_max_size: 32768
Expand Down Expand Up @@ -92,7 +92,7 @@ data:
scrape_interval: 10s
static_configs:
- targets:
- localhost:8889
- localhost:8899
service:
extensions:
- health_check
Expand Down Expand Up @@ -126,7 +126,7 @@ data:
exporter:
prometheus:
host: localhost
port: 8889
port: 8899
without_scope_info: true
without_type_suffix: true
without_units: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ spec:
component: otel-collector-agent
release: default
annotations:
checksum/config: 00f97cb880ac5f559fb0c81df8a73ba25c1e5303f64e381de76641e06bfa280f
checksum/config: e7dae0aa1a04901b3884f5e13bc9df7372c9291ba6360785d3f31acc89381f24
kubectl.kubernetes.io/default-container: otel-collector
spec:
hostNetwork: true
Expand Down Expand Up @@ -126,12 +126,10 @@ spec:
key: splunk_observability_access_token

readinessProbe:
initialDelaySeconds: 0
httpGet:
path: /
port: 13133
livenessProbe:
initialDelaySeconds: 0
httpGet:
path: /
port: 13133
Expand Down Expand Up @@ -160,6 +158,9 @@ spec:
- mountPath: /hostfs/var/run/utmp
name: host-var-run-utmp
readOnly: true
- mountPath: /hostfs/usr/lib/os-release
name: host-usr-osrelease
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
Expand Down Expand Up @@ -204,6 +205,9 @@ spec:
- name: host-var-run-utmp
hostPath:
path: /var/run/utmp
- name: host-usr-osrelease
hostPath:
path: /usr/lib/os-release
- name: otel-configmap
configMap:
name: default-splunk-otel-collector-otel-agent
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ spec:
component: otel-k8s-cluster-receiver
release: default
annotations:
checksum/config: 342f48d47a4695f5d28a0e13e1ca32e37a7cbe5c2bb2b852e714de2c8de8ae8e
checksum/config: 11fa0330d0d16afc5baca06aa3f2d205c97864dac62af187a38134d71dcc1bad
spec:
serviceAccountName: default-splunk-otel-collector
nodeSelector:
Expand Down Expand Up @@ -72,15 +72,13 @@ spec:
name: default-splunk-otel-collector
key: splunk_observability_access_token
readinessProbe:
initialDelaySeconds: 0
httpGet:
path: /
port: 13133
port: 13134
livenessProbe:
initialDelaySeconds: 0
httpGet:
path: /
port: 13133
port: 13134
resources:
limits:
cpu: 200m
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ data:
api_url: https://api.CHANGEME.signalfx.com
correlation: null
ingest_url: https://ingest.CHANGEME.signalfx.com
root_path: /hostfs
sync_host_metadata: true
extensions:
health_check:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ data:
timeout: 10s
extensions:
health_check:
endpoint: 0.0.0.0:13133
endpoint: 0.0.0.0:13134
processors:
batch:
send_batch_max_size: 32768
Expand Down Expand Up @@ -92,7 +92,7 @@ data:
scrape_interval: 10s
static_configs:
- targets:
- localhost:8889
- localhost:8899
service:
extensions:
- health_check
Expand Down Expand Up @@ -126,7 +126,7 @@ data:
exporter:
prometheus:
host: localhost
port: 8889
port: 8899
without_scope_info: true
without_type_suffix: true
without_units: true
Expand Down
Loading
Loading