Skip to content

[Bug]: Zombie threads accumulate when using crawl4ai AsyncWebCrawler inside Docker container #1666

@molntamas

Description

@molntamas

crawl4ai version

0.7.8

Expected Behavior

When using crawl4ai in a Dockerized FastAPI application, crawling shouldn't create zombie threads.
AlEach call to the /crawl endpoint creates approximately four additional threads.
These threads remain in a zombie state and are never cleaned up.
Zombie threads accumulate indefinitely with repeated crawl requests.
The threads are not released even though the AsyncWebCrawler context manager exits.

Additional Context

This behavior appears related to known Playwright issues when running inside Docker containers without an init process. Playwright documentation explicitly recommends running containers with an init system enabled.

Relevant references:

https://playwright.dev/python/docs/docker#recommended-docker-configuration

https://docs.docker.com/reference/cli/docker/container/run/#init

Using docker run --init mitigates similar issues by ensuring child processes are properly reaped. However, this is not documented anywhere in crawl4ai.

Open Questions / Requested Clarification

Should crawl4ai users always run containers with --init when using AsyncWebCrawler?

Is this a known limitation inherited from Playwright, or is this a crawl4ai lifecycle issue?

How should this be handled in Kubernetes deployments, where docker run --init cannot be easily used?

Is there a recommended workaround, such as embedding an init system (e.g., tini) in the Docker image?

Documentation Request

Regardless of whether this is an upstream Playwright limitation or a crawl4ai issue, the required runtime setup should be documented clearly. Users need explicit guidance on:

Whether --init is required

How to configure containers correctly

How to avoid zombie thread accumulation in production environments (especially Kubernetes)

Current Behavior

When using crawl4ai’s AsyncWebCrawler inside a Docker container, each invocation of the crawler leaks OS-level threads. These threads remain in a zombie state and are never cleaned up. Over time, repeated crawl requests cause unbounded growth in zombie threads.

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Build and start a Docker image with following steps:

Create a file named main.py with the following content:

import subprocess
from crawl4ai import AsyncWebCrawler
from crawl4ai.async_configs import BrowserConfig
from fastapi import FastAPI

app = FastAPI()

@app.get("/crawl")
async def crawl():
async with AsyncWebCrawler(
config=BrowserConfig(browser_type="chromium", headless=True)
) as crawler:
result = await crawler.arun(url="https://www.nbcnews.com/business
")
return result

@app.get("/get-threads-info")
def get_threads_info():
info = subprocess.run(
["top", "-H", "-b", "-n", "1"],
capture_output=True,
text=True
).stdout
print(info)
return info

Create a Dockerfile with the following content:

FROM python:3.11-slim-bookworm

RUN pip install fastapi==0.120.2 crawl4ai==0.7.8 uvicorn

RUN crawl4ai-setup

RUN apt update && apt install -y procps

WORKDIR /app
COPY . /app

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7777"]

Build the Docker image:

docker build -t fastapi-crawl-app .

Run the container:

docker run -p 7777:7777 fastapi-crawl-app

Call the crawl endpoint:

http://localhost:7777/crawl

Call the threads inspection endpoint:

http://localhost:7777/get-threads-info

Observe the thread state output from top in (the response of the endpoint and in the command line output of the docker container). It will look similar to:

...
Threads: 16 total, 1 running, 3 sleeping, 0 stopped, 12 zombie
...

Repeat calls to the /crawl endpoint and re-run /get-threads-info to see the remaining zombie threads.

Code snippets

OS

On Windowns with Linux base docker image

Python version

3.11

Browser

Chrome

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions