Skip to content

Adding apache and nginx access log tests #131454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

eyalkoren
Copy link
Contributor

Adding tests to try and reproduce failures in ingest pipeline following #121914.

A benchmark ingesting bulks of messages from different sample sources fail specifically on the ingestion of some (apparently random) nginx apache access logs.

@gbanasiak found the following example log event causing the failure:

{"@timestamp": "2020-09-02T08:21:51.000Z", "type": "beats", "input": {"type": "log"}, "agent": {"version": "7.3.2", "id": "2e5b12d4-8f16-4bfe-8d91-4a98dd9c7214", "type": "filebeat", "name": "infosec-ci-master-green.c.elastic-ci-prod.internal", "ephemeral_id": "3f2b9397-a5e0-4c51-a814-5cce75962b97", "hostname": "infosec-ci-master-green"}, "service": {"type": "nginx"}, "host": {"os": {"family": "debian", "platform": "ubuntu", "kernel": "5.3.0-1032-gcp", "version": "18.04.3 LTS (Bionic Beaver)", "codename": "bionic", "name": "Ubuntu"}, "architecture": "x86_64", "containerized": false, "id": "bfafdfcc69fc1f239af7a05fe266e68c", "mac": ["42:01:0a:e0:01:ec"], "ip": ["10.224.1.236", "fe80::4001:aff:fee0:1ec"], "hostname": "infosec-ci-master-green", "name": "infosec-ci-master-green.c.elastic-ci-prod.internal"}, "event": {"timezone": "+00:00", "module": "nginx", "dataset": "nginx.access"}, "log": {"file": {"path": "/var/log/nginx/infosec-ci.elastic.co.access.log"}, "offset": 1040193}, "ecs": {"version": "1.0.1"}, "cloud": {"provider": "gcp", "instance": {"id": "7729978443851144062", "name": "infosec-ci-master-green"}, "machine": {"type": "n1-standard-4"}, "project": {"id": "elastic-ci-prod"}, "availability_zone": "us-central1-a"}, "@version": "1", "tags": ["jenkins_master", "nginx", "infra-stats", "\ud83c\udd71\ufe0f", "llama", "\ud83e\udd99", "llama-prod"], "fileset": {"name": "access"}, "message": "28.27.251.216 - dustin03 [03/Jan/2020:21:05:52 +0000] \"GET /computer/api/json HTTP/1.1\" 200 602 \"-\" \"Go-http-client/1.1\"", "data_stream": {"type": "logs", "namespace": "default", "dataset": "nginx.access"}, "rally": {"message_size": 120, "doc_size": 1531}

The error is:

Bulk request failed: [HTTP status: 400, message: [1:2460] failed to parse: data stream timestamp field [@timestamp] is missing]

This PR contains a test that attempts to reproduce the issue with the same pipeline as used in the failing benchmark and the same message that is causing the failure.
Passing this test message in isolation through the same pipeline doesn't cause any issue, thus the root cause is likely such that require more real world scenarios, like big bulks and high concurrency.

@eyalkoren eyalkoren added the >test Issues or PRs that are addressing/adding tests label Jul 17, 2025
@eyalkoren eyalkoren closed this Jul 17, 2025
@eyalkoren eyalkoren deleted the analysis-complex-pipelines branch July 17, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Issues or PRs that are addressing/adding tests v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants