-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Description
crawl4ai version
0.6.1
Expected Behavior
When AsyncWebCrawler
is initialized with headless=False
(either via BrowserConfig
or directly as a constructor parameter), a visible browser window (e.g., Chromium) is expected to launch and appear on the user's display during the crawler's initialization phase or shortly after, before navigation (arun
) begins. The user should be able to see the browser UI operate.
Current Behavior
Despite explicitly setting headless=False
, no browser window becomes visible on the screen when initializing or running AsyncWebCrawler
. The process executes entirely in the background, behaving as if it were running in headless mode.
While the crawl operation itself may succeed technically (e.g., fetching HTML from example.com), the browser UI is never displayed.
Notably:
- Running Playwright directly on the same system with
launch(headless=False)
successfully launches a visible browser window. - Setting the
PWDEBUG=1
environment variable while usingAsyncWebCrawler
successfully forces a visible Playwright Inspector window to appear.
This strongly indicates the issue lies within Crawl4AI's handling of the standard headless=False
configuration, not with the underlying Playwright installation or the OS environment's capability to run visible browsers.
Is this reproducible?
Yes
Inputs Causing the Bug
- **URL(s):** `https://example.com` (Reproducible with simple URLs; likely URL-independent).
- **Settings used:** The core setting causing the issue (or rather, being ignored) is `headless=False`. This was tested in two ways:
1. Via `BrowserConfig`: `BrowserConfig(browser_type="chromium", headless=False, verbose=True)` passed to `AsyncWebCrawler(config=...)`.
2. Via direct parameter: `AsyncWebCrawler(browser_type="chromium", headless=False, verbose=True)`.
- **Input data:** Not applicable.
Steps to Reproduce
1. Set up the environment: Windows 11, Python 3.13.3, Crawl4AI 0.6.1, Playwright 1.51.0 (with browsers installed via `playwright install`).
2. Run either of the minimal Python code snippets provided below.
3. During the `asyncio.sleep(5)` pause included in the snippets (immediately after `AsyncWebCrawler` initialization), carefully observe the screen.
4. **Observe:** Note that no browser window appears, contrary to the expected behavior for `headless=False`. The script continues execution silently in the background.
Code snippets
# Snippet 1: Using BrowserConfig (Minimal Test)
import asyncio
import re
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
try:
from crawl4ai.config import BrowserConfig, CrawlerRunConfig, CacheMode
except ImportError:
pass # Ignore if already imported
async def test_crawl4ai_visible_minimal_config():
print("--- Test minimal Crawl4AI with headless=False via BrowserConfig ---")
browser_cfg = BrowserConfig(
browser_type="chromium",
headless=False, # Explicitly set to False
verbose=True
)
run_cfg = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
print("[!] Initializing AsyncWebCrawler with BrowserConfig(headless=False)...")
print("[!] >>> WATCH SCREEN CAREFULLY FOR 5 SECONDS <<<")
print("[!] >>> A browser window SHOULD appear now <<<")
await asyncio.sleep(5) # Pause for visual observation
try:
async with AsyncWebCrawler(config=browser_cfg) as crawler:
print("[!] Crawler initialized. Attempting crawl...")
result = await crawler.arun("https://example.com", config=run_cfg)
print(f"[!] Crawl finished. Success: {result.success}")
if result.success and result.html:
title_match = re.search(r"<title>(.*?)</title>", result.html, re.IGNORECASE | re.DOTALL)
print(f"[+] Title from HTML: {title_match.group(1).strip() if title_match else 'Not Found'}")
elif not result.success: print(f"[-] Crawl failed: {result.error_message}")
print("\n[?] CRITICAL QUESTION: Did you see a browser window open during the pause?")
except Exception as e: print(f"[!!!] Error: {e}")
finally: print("[!] Exiting async with block.")
if __name__ == "__main__": asyncio.run(test_crawl4ai_visible_minimal_config()); print("[!] Test finished.")
# Snippet 2: Using Direct Parameter (Minimal Test)
import asyncio
import re
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode
try:
from crawl4ai.config import CrawlerRunConfig, CacheMode
except ImportError:
pass
async def test_crawl4ai_visible_minimal_direct():
print("--- Test minimal Crawl4AI with headless=False via direct parameter ---")
run_cfg = CrawlerRunConfig(cache_mode=CacheMode.BYPASS)
print("[!] Initializing AsyncWebCrawler(headless=False)...")
print("[!] >>> WATCH SCREEN CAREFULLY FOR 5 SECONDS <<<")
print("[!] >>> A browser window SHOULD appear now <<<")
await asyncio.sleep(5) # Pause for visual observation
try:
async with AsyncWebCrawler(browser_type="chromium", headless=False, verbose=True) as crawler:
print("[!] Crawler initialized. Attempting crawl...")
result = await crawler.arun("https://example.com", config=run_cfg)
print(f"[!] Crawl finished. Success: {result.success}")
if result.success and result.html:
title_match = re.search(r"<title>(.*?)</title>", result.html, re.IGNORECASE | re.DOTALL)
print(f"[+] Title from HTML: {title_match.group(1).strip() if title_match else 'Not Found'}")
elif not result.success: print(f"[-] Crawl failed: {result.error_message}")
print("\n[?] CRITICAL QUESTION: Did you see a browser window open during the pause?")
except Exception as e: print(f"[!!!] Error: {e}")
finally: print("[!] Exiting async with block.")
if __name__ == "__main__": asyncio.run(test_crawl4ai_visible_minimal_direct()); print("[!] Test finished.")
# Snippet 3: Direct Playwright (Works for Comparison)
import asyncio
from playwright.async_api import async_playwright
async def test_browser_direct_visible():
print("--- Direct Playwright Test with headless=False (THIS WORKS) ---")
async with async_playwright() as p:
print("[!] Launching browser directly...")
# This launch correctly shows a window:
browser = await p.chromium.launch(headless=False)
print("[+] Browser window should be visible now!")
page = await browser.new_page()
await page.goto('https://example.com')
print(f'[+] Title: {await page.title()}')
await browser.close()
print("[+] Browser closed.")
if __name__ == "__main__": asyncio.run(test_browser_direct_visible()); print("[!] Direct test finished.")
OS
Windows 11
Python version
3.13.3
Browser
Chromium and Firefox
Browser version
Browser binary installed via playwright install associated with Playwright version 1.51.0
Error logs & Screenshots (if applicable)
No specific error logs related to failing to launch visibly are generated by Crawl4AI. The script often runs to completion successfully according to the logs (fetching HTML etc.), just without displaying the expected browser UI.
The core evidence is the visual observation during execution:
- Running Snippet 1 or Snippet 2 (using Crawl4AI with
headless=False
): No browser window appears. - Running Snippet 3 (using Playwright directly with
headless=False
): A browser window correctly appears. - Running Crawl4AI with the
PWDEBUG=1
environment variable set: The Playwright Inspector window correctly appears.