[Feature Request]: Allow passing crawler_strategy when sending requests to Docker container #1294

GeorgelPreput · 2025-07-11T09:57:41Z

GeorgelPreput
Jul 11, 2025

What needs to be done?

When spawning a crawler from the crawler pool via get_crawler() in the Docker API implementation, allow for passing the crawler_strategy for AsyncWebCrawler as well. Currently only the BrowserConfig is passed.

What problem does this solve?

Currently it's impossible to parse PDF files by sending an API request to a deployed Crawl4AI Docker container. This is because by default, the crawler that is being spawned has no crawler_strategy. If we look at the functioning PDF parsing example, it becomes apparent that the AsyncWebCrawler must be initialized with crawler_strategy=PDFCrawlerStrategy().

Target users/beneficiaries

Whoever is looking to also parse PDFs and is using a dockerized installation of Crawl4AI

Current alternatives/workarounds

Only running things from a Python script instead of using the Docker deployment.

Proposed approach

In crawl4ai/deploy/docker/api.py:
1. Add crawler_strategy as a dict parameter to handle_crawl_request()
2. Pass the crawler_strategy along with the browser_config to get_crawler()
3. Figure out something similar for handle_stream_crawl_request()
In crawl4ai/deploy/docker/crawler_pool.py:
1. Add crawler_strategy as a parameter to get_crawler(). Maybe initialize it with None as default value
2. Pass crawler_strategy when initializing crawler = AsyncWebCrawler(...)

mzyfree · 2025-11-18T08:58:53Z

mzyfree
Nov 18, 2025

Hi @unclecode @aravindkarnam

Feature Request: PDF URL Support in Docker API `/crawl` Endpoint

Current Situation

The SDK supports PDF processing via PDFCrawlerStrategy and PDFContentScrapingStrategy (introduced in v0.5.0), but the Docker API's /crawl endpoint cannot handle direct PDF URLs. When attempting to crawl a PDF URL (e.g., https://example.com/document.pdf), the request fails with net::ERR_FAILED during the navigation phase.

Problem

The /crawl endpoint uses get_crawler(browser_config) which creates an AsyncWebCrawler with the default AsyncPlaywrightCrawlerStrategy. This strategy attempts to navigate to PDF URLs using Playwright, which fails because PDFs are not renderable HTML pages. Even when specifying PDFContentScrapingStrategy in crawler_config.scraping_strategy, the navigation fails before the scraping strategy can be applied.

Use Case

We need to crawl PDF documents directly from URLs in a production environment using the Docker API. The SDK approach works but requires running Python code, which doesn't fit our containerized architecture.

Proposed Solution

Option A: Support crawler_strategy parameter in the /crawl request body, allowing users to specify PDFCrawlerStrategy for PDF URLs.
Option B: Auto-detect PDF URLs (by .pdf extension or Content-Type: application/pdf) and automatically use PDFCrawlerStrategy when detected.
Option C: Add a dedicated /crawl/pdf endpoint that handles PDF URLs specifically.

Technical Details

Current code location: deploy/docker/api.py → handle_crawl_request() → get_crawler()
SDK implementation already exists: crawl4ai/processors/pdf/__init__.py
The issue is that get_crawler() doesn't accept a crawler_strategy parameter

Example Request (if Option A is implemented)

{
"urls": ["https://example.com/document.pdf"],
"browser_config": { "type": "BrowserConfig", "params": {} },
"crawler_config": {
"type": "CrawlerRunConfig",
"params": {
"scraping_strategy": {
"type": "PDFContentScrapingStrategy",
"params": { "extract_images": false }
}
}
},
"crawler_strategy": {
"type": "PDFCrawlerStrategy",
"params": {}
}
}

Priority

This is a blocker for our use case. We'd be happy to contribute a PR if you can provide guidance on the preferred approach.

Thank you for your consideration!

1 reply

unclecode Nov 19, 2025
Maintainer

@mzyfree definitely like the idea, we add this strategy into api parameters l, so yes we will do @ntohidi check this plz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature Request]: Allow passing crawler_strategy when sending requests to Docker container #1294

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Feature Request]: Allow passing crawler_strategy when sending requests to Docker container #1294

Uh oh!

GeorgelPreput Jul 11, 2025

What needs to be done?

What problem does this solve?

Target users/beneficiaries

Current alternatives/workarounds

Proposed approach

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

mzyfree Nov 18, 2025

Feature Request: PDF URL Support in Docker API /crawl Endpoint

Current Situation

Problem

Use Case

Proposed Solution

Technical Details

Example Request (if Option A is implemented)

Priority

Uh oh!

unclecode Nov 19, 2025 Maintainer

GeorgelPreput
Jul 11, 2025

Replies: 1 comment 2 replies

mzyfree
Nov 18, 2025

Feature Request: PDF URL Support in Docker API `/crawl` Endpoint

unclecode Nov 19, 2025
Maintainer