Scalability Issue: outages / timeouts / slow responses in the recrawler service may lead to message queue buildups

**Describe the bug**
The [`recrawler`](https://github.com/openculinary/recrawler) service has been switched off since early January, due to a lack of query results which will be opened and tracked as a separate issue for that service.

If no `recrawler` pods are available, requests to that service fail with connection errors -- after a considerable timeout -- as visible here in the [`backend-worker`](https://github.com/openculinary/backend/blob/bddbc605dfffc88dd21cc9de18af1a3968f9637c/k8s/worker-deployment.yaml#L12-L23) deployment logs:

```
[2021-01-27 18:28:19,290: WARNING/ForkPoolWorker-2] Recrawling failed due to "ConnectionError" exception
[2021-01-27 18:28:19,291: WARNING/ForkPoolWorker-3] Recrawling failed due to "ConnectionError" exception
[2021-01-27 18:30:30,362: WARNING/ForkPoolWorker-1] Recrawling failed due to "ConnectionError" exception
[2021-01-27 18:30:30,366: WARNING/ForkPoolWorker-3] Recrawling failed due to "ConnectionError" exception
```

This causes the throughput of the `backend-worker` instances to drop dramatically since most of the task worker time is spent attempting to make a connection.

It may be useful to consider both a short-term and longer-term fix here.  Since we are not currently receiving results from the `recrawler` service, a patch would involve re-deploying that service to respond with empty results (effectively a no-op).  Longer-term we likely want to isolate the queue workers that handle event logs, and perhaps add circuit breakers and/or adjust the connection timeouts they use.

**Expected behavior**
Throughput for the majority of the RecipeRadar message queues should not be adversely affected by outages in a minor service.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scalability Issue: outages / timeouts / slow responses in the recrawler service may lead to message queue buildups #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scalability Issue: outages / timeouts / slow responses in the recrawler service may lead to message queue buildups #43

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions