Vector crashes on k8s if the sink goes down #22498
Replies: 2 comments
-
On a related note, I've been testing the integration between http client <> Vector http server sink <> Vector Elasticsearch sink and found something that I find rather interesting. According to the Elasticsearch sink buffer doc, events are flushed either when the timeout expires or when the batch size is >= max bytes/events. But what I'm seeing right now is that Vector, at times, doesn't send a batch until another request is sent with another batch. I've enabled http logging on the Elasticsearch side to verify if requests are coming through, and that's how I discovered this. I enabled debug logging in Vector to try to understand what's going on. Events pushed log
Events not pushed log
But I wasn't able to figure out why the events weren't pushed based on the logs. All I can see is that when they were, the logs show that a connection is established with Elasticsearch and events are sent. I'm not sure if this is expected behaviour or if there's some issue somewhere (in Vector or somewhere else). |
Beta Was this translation helpful? Give feedback.
-
Hi @rolandjitsu, sorry for the late response. This discussion fell through the cracks. To better assist, please share your configuration. There have been a few new discussions around memory issues: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm in a bit of a pickle. I'm running vector as an aggregator on k8s, configured to consume data from a http server source and push to an Elasticsearch sink.
After running a few tests, it looks like if the sink goes down, but data is still flowing in, Vector pods will eventually crash and show the following error:
I've found the following related issues:
But I'm a bit confused about it. Mainly because as soon as the sink goes down, the sources will start seeing non-200 http status codes, which to me feels like the the buffering mechanism does not work or it's not intended to work in this case. If it would, I should see 200 since the request made it through to vector and should only start returning non-200 after the buffer is full and can no longer take anymore data.
I'm just looking for some clarification on what exactly is the expected behaviour:
I use the vector helm chart with the following config:
Config
NOTE: On the source side, we have agents pushing to the http endpoint. As soon as a non-200 occurs, they will stop sending new events and keep retrying (at the first failure) until it receives 200, then it continues pushing. There's also a buffering mechanism on the agent side.
Beta Was this translation helpful? Give feedback.
All reactions