-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Description
crawl4ai version
0.7.7
Expected Behavior
When using AdaptiveCrawler, I expect to be able to configure the timeout for the internal link preview extraction step. This should likely be possible via the AdaptiveConfig object or a parameter in AdaptiveCrawler.__init__, ensuring that CrawlerRunConfig within _crawl_with_preview respects this custom timeout.
This issue prevents effective adaptive crawling on sites with high latency or complex DOMs that take longer than 5s to reach a state where the head metadata is available for relevance scoring.
Current Behavior
In AdaptiveCrawler._crawl_with_preview, the CrawlerRunConfig for the preview step uses a hardcoded timeout of 5 seconds. This value cannot be overridden by any public configuration API.
Relevant code in adaptive_crawler.py (lines 1454-1467):
config = CrawlerRunConfig(
link_preview_config=LinkPreviewConfig(
# ...
timeout=5, # <--- HARDCODED
# ...
),
# ...
)Is this reproducible?
Yes
Inputs Causing the Bug
- **Target URL**: Any slow-loading website (response time > 5s or heavy JS rendering).
- **Config**: Standard `AdaptiveConfig`.Steps to Reproduce
1. Initialize `AdaptiveCrawler` with standard configuration.
2. Call `await crawler.digest(url="https://slow-website.example.com", query="test")`.
3. Observe that the crawler likely fails to extract link contexts or returns incomplete results because the preview step times out after 5 seconds, regardless of any other timeout settings.Code snippets
OS
Windows 11
Python version
3.11
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response