Skip to content

[Post 3.1] Update stack versions in recipes/samples and e2e matrix #8776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

barkbay
Copy link
Contributor

@barkbay barkbay commented Jul 30, 2025

Update stack versions in recipes/samples and e2e matrix.

@barkbay barkbay added >docs Documentation exclude-from-release-notes Exclude this PR from appearing in the release notes v3.1.0 labels Jul 30, 2025
@@ -227,7 +227,7 @@ spec:
meta:
package:
name: system
version: 9.0.0
version: 9.1.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to check this one, IIUC Agent packages have a different lifecycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last version of package system seems to be 2.5.1, not sure why we have the stack version here. I'm wondering it is not a bug in update-stack-version.sh.

image

@prodsecmachine
Copy link
Collaborator

prodsecmachine commented Jul 30, 2025

🎉 Snyk checks have passed. No issues have been found so far.

security/snyk check is complete. No issues have been found. (View Details)

license/snyk check is complete. No issues have been found. (View Details)

@@ -6,16 +6,16 @@
- E2E_STACK_VERSION: "8.18.0"
# current stack version 9.0.0 is tested in all other tests no need to test it again
- E2E_STACK_VERSION: "8.19.0-SNAPSHOT"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- E2E_STACK_VERSION: "8.19.0-SNAPSHOT"
- E2E_STACK_VERSION: "8.19.0"

8.19.0 should have been released in tandem with 9.1.0 I believe.

Copy link
Collaborator

@pebrc pebrc Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the other question is whether we would like to replace 8.18. with 8.19 on L6. And keep 8.19.0-SNAPSHOT for future patch releases of the 8.19 branch here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the other question is whether we would like to replace 8.18. with 8.19 on L6. And keep 8.19.0-SNAPSHOT for future patch releases of the 8.19 branch here.

This was my original plan, I forgot to update 8.18.0 on L6

@barkbay
Copy link
Contributor Author

barkbay commented Jul 31, 2025

buildkite test this -f p=kind -m s=9.1.0,s=9.2.0-SNAPSHOT

@barkbay
Copy link
Contributor Author

barkbay commented Aug 1, 2025

TestFleetKubernetesNonRootIntegrationRecipe/ES_data_should_pass_validations ~ kind-9-1-0
=== RUN   TestFleetKubernetesNonRootIntegrationRecipe/ES_data_should_pass_validations
Retries (15m0s timeout): ..........................................................................................................................................................................................................................................................................................................
    step.go:51: 
        	Error Trace:	/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/utils.go:94
        	Error:      	Received unexpected error:
        	            	elasticsearch client failed for https://elasticsearch-rss8-es-default-2.elasticsearch-rss8-es-default.e2e-h4gpy-mercury:9200/_data_stream/logs-elastic_agent-default: 404 Not Found: {Status:404 Error:{CausedBy:{Reason: Type:} Reason:no such index [logs-elastic_agent-default] Type:index_not_found_exception StackTrace: RootCause:[{Reason:no such index [logs-elastic_agent-default] Type:index_not_found_exception}]}}
        	Test:       	TestFleetKubernetesNonRootIntegrationRecipe/ES_data_should_pass_validations
TestFleetKubernetesNonRootIntegrationRecipe/ES_data_should_pass_validations ~ kind-9-2-0-snaps
=== RUN   TestFleetKubernetesNonRootIntegrationRecipe/ES_data_should_pass_validations
Retries (15m0s timeout): ..........................................................................................................................................................................................................................................................................................................
    step.go:51: 
        	Error Trace:	/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/utils.go:94
        	Error:      	Received unexpected error:
        	            	elasticsearch client failed for https://elasticsearch-4mbv-es-default-1.elasticsearch-4mbv-es-default.e2e-86oa9-mercury:9200/_data_stream/logs-elastic_agent-default: 404 Not Found: {Status:404 Error:{CausedBy:{Reason: Type:} Reason:no such index [logs-elastic_agent-default] Type:index_not_found_exception StackTrace: RootCause:[{Reason:no such index [logs-elastic_agent-default] Type:index_not_found_exception}]}}
        	Test:       	TestFleetKubernetesNonRootIntegrationRecipe/ES_data_should_pass_validations

@barkbay
Copy link
Contributor Author

barkbay commented Aug 1, 2025

For 9.1.0 Fleet server and Agents cannot connect to ES:

{
    "log.level": "error",
    "@timestamp": "2025-07-31T07:59:16.798Z",
    "message": "Error dialing x509: certificate signed by unknown authority",
    "component": {
        "binary": "metricbeat",
        "dataset": "elastic_agent.metricbeat",
        "id": "beat/metrics-monitoring",
        "type": "beat/metrics"
    },
    "log": {
        "source": "beat/metrics-monitoring"
    },
    "service.name": "metricbeat",
    "ecs.version": "1.6.0",
    "network.transport": "tcp",
    "server.address": "elasticsearch-rss8-es-http.e2e-h4gpy-mercury.svc:9200",
    "log.logger": "elasticsearch.esclientleg",
    "log.origin": {
        "file.line": 39,
        "file.name": "transport/logging.go",
        "function": "github.com/elastic/elastic-agent-libs/transport/httpcommon.(*HTTPTransportSettings).RoundTripper.LoggingDialer.func2"
    }
}
{
    "log.level": "error",
    "@timestamp": "2025-07-31T07:59:18.667Z",
    "message": "http: TLS handshake error from 10.244.3.1:51158: remote error: tls: bad certificate\n",
    "component": {
        "binary": "fleet-server",
        "dataset": "elastic_agent.fleet_server",
        "id": "fleet-server-default",
        "type": "fleet-server"
    },
    "log": {
        "source": "fleet-server-default"
    },
    "ecs.version": "1.6.0",
    "service.name": "fleet-server",
    "service.type": "fleet-server"
}

I'll check if it is also the case for 9.2.0, and double check if TestFleetKubernetesIntegrationRecipe is also failing.

Edit:

  • Same issue for 9.2.0
  • TestFleetKubernetesIntegrationRecipe does not seem to be affected

@barkbay
Copy link
Contributor Author

barkbay commented Aug 1, 2025

I can reproduce with 9.0.3 but I can't with 9.0.0, so something has changed either in Kibana or in Agent between these 2 versions. (I'll check other patch releases)

@barkbay
Copy link
Contributor Author

barkbay commented Aug 1, 2025

The problem

  • This config (in version 9.0.3) has 3 Agents + 1 Fleet server.
  • That exact same configuration seems to be working as expected with version 9.0.0
  • With 9.0.3 I can see all the Agents being healthy in Kibana while they are all endlessly reporting in their logs:
{
    "log.level": "error",
    "@timestamp": "2025-08-01T08:45:48.269Z",
    "message": "Failed to connect to backoff(elasticsearch(https://elasticsearch-nb68-es-http.e2e-mercury.svc:9200)): Get \"https://elasticsearch-nb68-es-http.e2e-mercury.svc:9200\": x509: certificate signed by unknown authority",
    "component": {
        "binary": "metricbeat",
        "dataset": "elastic_agent.metricbeat",
        "id": "kubernetes/metrics-default",
        "type": "kubernetes/metrics"
    },
    "log": {
        "source": "kubernetes/metrics-default"
    },
    "log.logger": "publisher_pipeline_output",
    "log.origin": {
        "file.line": 149,
        "file.name": "pipeline/client_worker.go",
        "function": "github.com/elastic/beats/v7/libbeat/publisher/pipeline.(*netClientWorker).run"
    },
    "service.name": "metricbeat",
    "ecs.version": "1.6.0"
}
image

Side note, this warning is interesting:

image

Kibana configuration

Kibana is configured with:

xpack.fleet.agentPolicies:
    - id: eck-fleet-server
      is_managed: true
      monitoring_enabled:
      - logs
      - metrics
      name: Fleet Server on ECK policy
      namespace: default
      package_policies:
      - id: fleet_server-1
        name: fleet_server-1
        package:
          name: fleet_server
      unenroll_timeout: 900
    - id: eck-agent
      is_managed: true
      monitoring_enabled:
      - logs
      - metrics
      name: Elastic Agent on ECK policy
      namespace: default
      package_policies:
      - name: system-1
        package:
          name: system
      - name: kubernetes-1
        package:
          name: kubernetes
      unenroll_timeout: 900
    xpack.fleet.agents.fleet_server.hosts:
    - https://fleet-server-nb68-agent-http.e2e-mercury.svc:8220
    xpack.fleet.outputs:
    - hosts:
      - https://elasticsearch-nb68-es-http.e2e-mercury.svc:9200
      id: eck-fleet-agent-output-elasticsearch
      is_default: true
      name: eck-elasticsearch
      ssl:
        certificate_authorities:
        - /mnt/elastic-internal/elasticsearch-association/e2e-mercury/elasticsearch-nb68/certs/ca.crt
      type: elasticsearch
    xpack.fleet.packages:
    - name: system
      version: latest
    - name: elastic_agent
      version: latest
    - name: fleet_server
      version: latest
    - name: kubernetes
      version: latest

This is what the output looks like in Kibana:

image

Elasticsearch CA in Kibana is valid

I checked the CA inside Fleet and it is valid:

curl -u elastic:REDACTED  --cacert /mnt/elastic-internal/elasticsearch-association/e2e-mercury/elasticsearch-nb68/certs/ca.crt https://elasticsearch-nb68-es-http.e2e-mercury.svc:9200
{
  "name" : "elasticsearch-nb68-es-default-0",
  "cluster_name" : "elasticsearch-nb68",
  "cluster_uuid" : "rqakRG9hSrSW0AUS6e-7Zg",
  "version" : {
    "number" : "9.0.3",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "cc7302afc8499e83262ba2ceaa96451681f0609d",
    "build_date" : "2025-06-18T22:09:56.772581489Z",
    "build_snapshot" : false,
    "lucene_version" : "10.1.0",
    "minimum_wire_compatibility_version" : "8.18.0",
    "minimum_index_compatibility_version" : "8.0.0"
  },
  "tagline" : "You Know, for Search"
}

If the configured CA is valid, why do we have these certificates errors?

@barkbay
Copy link
Contributor Author

barkbay commented Aug 4, 2025

For 9.0.x and 9.1.x I believe this is going to be fixed by elastic/kibana#230370 and elastic/kibana#230371. In the meantime I'm going to update the code to skip impacted versions for that test.

@barkbay
Copy link
Contributor Author

barkbay commented Aug 5, 2025

buildkite test this -f p=kind,t=TestFleetKubernetesNonRootIntegrationRecipe -m s=9.1.0,s=9.2.0-SNAPSHOT

@barkbay
Copy link
Contributor Author

barkbay commented Aug 6, 2025

My understanding is that last Kibana snapshot (docker.elastic.co/kibana/kibana:9.2.0-SNAPSHOT) built yesterday should include elastic/kibana#230211

I'll retry TestFleetKubernetesNonRootIntegrationRecipe ....

@barkbay
Copy link
Contributor Author

barkbay commented Aug 6, 2025

buildkite test this -f p=kind,t=TestFleetKubernetesNonRootIntegrationRecipe -m s=9.1.0,s=9.2.0-SNAPSHOT

@barkbay
Copy link
Contributor Author

barkbay commented Aug 6, 2025

Looks like this is still not fixed 😞

@barkbay
Copy link
Contributor Author

barkbay commented Aug 7, 2025

We still have the same error with 9.2.0-SNAPSHOT:

{"log.level":"error","@timestamp":"2025-08-06T12:53:38.105Z","message":"Error dialing x509: certificate signed by unknown authority","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"ecs.version":"1.6.0","log.origin":{"file.line":39,"file.name":"transport/logging.go","function":"github.com/elastic/elastic-agent-libs/transport/httpcommon.(*HTTPTransportSettings).RoundTripper.LoggingDialer.func2"},"log.logger":"elasticsearch.esclientleg","service.name":"filebeat","network.transport":"tcp","server.address":"elasticsearch-b7m5-es-http.e2e-r2fx9-mercury.svc:9200","ecs.version":"1.6.0"}
...
{"log.level":"error","@timestamp":"2025-08-06T13:09:58.868Z","message":"Error dialing x509: certificate signed by unknown authority","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-monitoring","type":"filestream"},"log":{"source":"filestream-monitoring"},"log.logger":"elasticsearch.esclientleg","service.name":"filebeat","network.transport":"tcp","log.origin":{"file.line":39,"file.name":"transport/logging.go","function":"github.com/elastic/elastic-agent-libs/transport/httpcommon.(*HTTPTransportSettings).RoundTripper.LoggingDialer.func2"},"server.address":"elasticsearch-b7m5-es-http.e2e-r2fx9-mercury.svc:9200","ecs.version":"1.6.0","ecs.version":"1.6.0"}

Hi @juliaElastic 👋 , could you please confirm that this should be fixed in 9.2.0-SNAPSHOT? Thanks 🙇

@barkbay
Copy link
Contributor Author

barkbay commented Aug 8, 2025

buildkite test this -f p=kind,t=TestFleetKubernetesNonRootIntegrationRecipe -m s=9.1.1,s=9.2.0-SNAPSHOT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs Documentation exclude-from-release-notes Exclude this PR from appearing in the release notes v3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants