[Feature Request]: Allow passing crawler_strategy when sending requests to Docker container #1294
Replies: 1 comment 2 replies
-
Feature Request: PDF URL Support in Docker API
|
Beta Was this translation helpful? Give feedback.
-
Feature Request: PDF URL Support in Docker API
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What needs to be done?
When spawning a crawler from the crawler pool via
get_crawler()in the Docker API implementation, allow for passing thecrawler_strategyforAsyncWebCrawleras well. Currently only theBrowserConfigis passed.What problem does this solve?
Currently it's impossible to parse PDF files by sending an API request to a deployed Crawl4AI Docker container. This is because by default, the crawler that is being spawned has no
crawler_strategy. If we look at the functioning PDF parsing example, it becomes apparent that theAsyncWebCrawlermust be initialized withcrawler_strategy=PDFCrawlerStrategy().Target users/beneficiaries
Whoever is looking to also parse PDFs and is using a dockerized installation of Crawl4AI
Current alternatives/workarounds
Only running things from a Python script instead of using the Docker deployment.
Proposed approach
In
crawl4ai/deploy/docker/api.py:crawler_strategyas adictparameter tohandle_crawl_request()crawler_strategyalong with thebrowser_configtoget_crawler()handle_stream_crawl_request()In
crawl4ai/deploy/docker/crawler_pool.py:crawler_strategyas a parameter toget_crawler(). Maybe initialize it withNoneas default valuecrawler_strategywhen initializingcrawler = AsyncWebCrawler(...)Beta Was this translation helpful? Give feedback.
All reactions