Skip to content

[Bug]: BFSDeepCrawlStrategy(max_depth=0) prevents delay_before_return_html from working - captures page before JavaScript loads #1665

@nanwio

Description

@nanwio

crawl4ai version

0.6.3

Expected Behavior

When using BFSDeepCrawlStrategy(max_depth=0) with delay_before_return_html=20.0 in CrawlerRunConfig, the crawler should:

  1. Wait for the specified 20 seconds before capturing HTML
  2. Allow JavaScript content to fully load during this delay
  3. Capture the fully rendered page with all dynamic content loaded

For JavaScript-heavy pages that show a loading spinner ("Loading...") while content loads asynchronously, the 20-second delay should be sufficient for content to appear.

Expected result: ~40,000+ characters of fully loaded content with document links extracted.

Current Behavior

When BFSDeepCrawlStrategy(max_depth=0) is present in the configuration, the delay_before_return_html parameter appears to be completely ignored or significantly reduced, resulting in:

  1. The page is captured before JavaScript finishes loading
  2. Only the initial loading screen is captured (loading spinner + error message)
  3. Content length is only 821 characters instead of 40,000+
  4. The page shows "Loading..." and "Cannot connect to server"

Important: The exact same configuration WITHOUT BFSDeepCrawlStrategy works perfectly and captures the full 40,904 characters with all content loaded.

Is this reproducible?

Yes

Inputs Causing the Bug

Test case: Any page with JavaScript-rendered content requiring 15-20 seconds to load (e.g., pages with heavy AJAX, infinite scroll, or dynamic content loading)

Settings:

# Browser Config
BrowserConfig(
  headless=True,
  java_script_enabled=True,
  viewport_width=1920,
  viewport_height=1080,
)

# Crawler Config (WITH bug)
CrawlerRunConfig(
  deep_crawl_strategy=BFSDeepCrawlStrategy(max_depth=0),  # Bug trigger
  delay_before_return_html=20.0,  # Ignored when above is present
  magic=True,
  simulate_user=True,
  override_navigator=True,
)

Steps to Reproduce

1. Install crawl4ai 0.6.3
  pip install crawl4ai==0.6.3
2. Create a test script with the reproduction code (see Code Snippets section below)
3. Run Test 1 - WITH BFSDeepCrawlStrategy(max_depth=0):
  - Observe: Only ~800 characters captured
  - Observe: Content shows loading screen ("Loading..." text/spinner)
  - Note: Timer shows ~20 seconds but content is incomplete
4. Run Test 2 - WITHOUT BFSDeepCrawlStrategy:
  - Observe: ~40,000 characters captured
  - Observe: Full page content with all dynamic elements loaded
  - Note: Same ~20 second timing but content is complete
5. Compare results:
  - WITH strategy: Premature capture (bug)
  - WITHOUT strategy: Correct behavior

Code snippets

Minimal reproduction script

# !/usr/bin/env python3
"""Minimal script to reproduce the bug."""
import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.deep_crawling import BFSDeepCrawlStrategy
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator


async def reproduce_bug():
    """Compare behavior with and without BFSDeepCrawlStrategy."""

    browser_config = BrowserConfig(
        headless=True,
        java_script_enabled=True,
        viewport_width=1920,
        viewport_height=1080,
    )

    md_gen = DefaultMarkdownGenerator(options={"ignore_links": True})
    url = "https://example.com"  # Replace with JS-heavy test URL

    # Test 1: WITH BFSDeepCrawlStrategy (bug)
    config_bug = CrawlerRunConfig(
        deep_crawl_strategy=BFSDeepCrawlStrategy(max_depth=0),  # Bug trigger
        markdown_generator=md_gen,
        delay_before_return_html=20.0,  # Ignored!
        magic=True,
    )

    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(url=url, config=config_bug)
        r = result[0] if isinstance(result, list) else result
        print(f"WITH strategy: {len(r.markdown.raw_markdown):,} chars")

    # Test 2: WITHOUT BFSDeepCrawlStrategy (works)
    config_work = CrawlerRunConfig(
        # No deep_crawl_strategy
        markdown_generator=md_gen,
        delay_before_return_html=20.0,  # Works!
        magic=True,
    )

    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(url=url, config=config_work)
        r = result[0] if isinstance(result, list) else result
        print(f"WITHOUT strategy: {len(r.markdown.raw_markdown):,} chars")


asyncio.run(reproduce_bug())

---
Workaround: Config builder for single - page crawls

def build_single_page_config(delay_seconds: float = 20.0) -> CrawlerRunConfig:
    """
    Correct configuration for single-page crawling without link following.

    NOTE: Do NOT use BFSDeepCrawlStrategy(max_depth=0) - it breaks delay_before_return_html.
    Simply omit deep_crawl_strategy entirely.
    """
    return CrawlerRunConfig(
        # ❌ DON'T: deep_crawl_strategy=BFSDeepCrawlStrategy(max_depth=0)
        # ✅ DO: Omit it completely for single-page crawls
        markdown_generator=DefaultMarkdownGenerator(options={"ignore_links": True}),
        delay_before_return_html=delay_seconds,
        wait_for="css:body",
        magic=True,
        simulate_user=True,
        override_navigator=True,
    )


---
Helper: Quick comparison function

async def compare_delay_behavior(url: str, delay: float = 20.0):
    """Quick test to verify if bug exists on a given URL."""
    browser = BrowserConfig(headless=True, java_script_enabled=True)

    # With bug
    cfg_bug = CrawlerRunConfig(
        deep_crawl_strategy=BFSDeepCrawlStrategy(max_depth=0),
        delay_before_return_html=delay,
    )

    # Without bug
    cfg_ok = CrawlerRunConfig(
        delay_before_return_html=delay,
    )

    async with AsyncWebCrawler(config=browser) as crawler:
        r1 = await crawler.arun(url, config=cfg_bug)
        chars_bug = len(r1[0].markdown.raw_markdown if isinstance(r1, list) else r1.markdown.raw_markdown)

        r2 = await crawler.arun(url, config=cfg_ok)
        chars_ok = len(r2[0].markdown.raw_markdown if isinstance(r2, list) else r2.markdown.raw_markdown)

        print(f"WITH BFSDeepCrawlStrategy: {chars_bug:,} chars")
        print(f"WITHOUT: {chars_ok:,} chars")
        print(f"Bug present: {chars_bug < chars_ok * 0.5}")  # >50% content loss


# Usage
asyncio.run(compare_delay_behavior("https://example.com"))

OS

macOS (Darwin 24.6.0), also reproducible on Linux

Python version

3.11+

Browser

Chromium (via Playwright)

Browser version

Chrome 131.0.0.0 (via crawl4ai's Playwright integration)

Error logs & Screenshots (if applicable)

WITH BFSDeepCrawlStrategy (bug):

[FETCH]... | ✓ | ⏱: 20.92s
Content length: 821 chars
Preview: [Loading spinner and "Loading..." text visible]

WITHOUT BFSDeepCrawlStrategy (works):

[FETCH]... | ✓ | ⏱: 21.63s
Content length: 40,904 chars
Preview: [Full page content visible]

Note: Timer shows ~20s in both cases, but first test captures prematurely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ⁇ Needs ClarificationThe issue requires additional details or a rewrite to be actionable.🐞 BugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions