Skip to content

[Bug]: Error: Page.content: Target page, context or browser has been closed #842

@eliaweiss

Description

@eliaweiss

crawl4ai version

0.5.0.post4

Expected Behavior

Crawler should crawl

Current Behavior

I get the following error

[ERROR]... × https://out-door.co.il/product/%d7%a4%d7%90%d7%a0%... | Error:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ × Unexpected error in _crawl_web at line 528 in wrap_api_call (venv/lib/python3.12/site- │
│ packages/playwright/_impl/_connection.py): │
│ Error: Page.content: Target page, context or browser has been closed │
│ │
│ Code context: │
│ 523 parsed_st = _extract_stack_trace_information_from_stack(st, is_internal) │
│ 524 self._api_zone.set(parsed_st) │
│ 525 try: │
│ 526 return await cb() │
│ 527 except Exception as error: │
│ 528 → raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None │
│ 529 finally: │
│ 530 self._api_zone.set(None) │
│ 531 │
│ 532 def wrap_api_call_sync( │
│ 533 self, cb: Callable[[], Any], is_internal: bool = False │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

this happens after about 50 to 100 pages

I use ec2 t2.large and this is my code

@app.post("/crawl", response_model=CrawlResponse)
async def crawl(request: CrawlRequest):
"""
Run the crawler on the specified URL
"""
print(request)

try:
    # Convert UUID to string for the query
    crawler_config = execute_select_query(f"SELECT * FROM crawls WHERE id = '{request.crawler_id}'")
    if not crawler_config:
        raise HTTPException(
            status_code=404,
            detail=f"Crawler config not found for id: {request.crawler_id}"
        )
    
    crawler_config = crawler_config[0]
    root_url = crawler_config['root_url']
    logger.info(f"🔍 Starting crawl for URL: {root_url}")
    
    depth = crawler_config.get('depth', 1)
    include_external = crawler_config.get('include_external', False)
    max_pages = crawler_config.get('max_pages', 5)
    
    # Step 1: Create a pruning filter
    prune_filter = PruningContentFilter(
        # Lower → more content retained, higher → more content pruned
        threshold=0.45,           
        # "fixed" or "dynamic"
        threshold_type="dynamic",  
        # Ignore nodes with <5 words
        min_word_threshold=5      
    )

    # Step 2: Insert it into a Markdown Generator
    md_generator = DefaultMarkdownGenerator(content_filter=prune_filter) #, options={"ignore_links": True}

    # Step 3: Pass it to CrawlerRunConfig
    # Configure the crawler
    config = CrawlerRunConfig(
        deep_crawl_strategy=BFSDeepCrawlStrategy(
            max_depth=depth,
            include_external=include_external,
            max_pages=max_pages
        ),
        scraping_strategy=LXMLWebScrapingStrategy(),
        stream=True,
        verbose=True,
        markdown_generator=md_generator
    )

    crawled_pages = []
    page_count = 0

    # Run the crawler
    async with AsyncWebCrawler() as crawler:
        try:
            async for result in await crawler.arun(crawler_config['root_url'], config=config):
                processed_result = await process_crawl_result(crawler_config, result)
                crawled_pages.append(processed_result)
                page_count += 1
                logger.info(f"Processed page {page_count}: {result.url}")
        except Exception as crawl_error:
            logger.error(f"Error during crawling: {str(crawl_error)}")
            raise HTTPException(
                status_code=500,
                detail=f"Crawling process failed: {str(crawl_error)}"
            )

    result = {
        "url": root_url,
        "depth": depth,
        "pages_crawled": page_count,
        "crawled_pages": crawled_pages
    }
    
    return CrawlResponse(
        status="success",
        data=result
    )

except Exception as e:
    logger.error(f"Crawling error: {str(e)}")
    raise HTTPException(
        status_code=500,
        detail=f"Crawling failed: {str(e)}"
    )

any idea on how to debug it?
what does this error means?

My guess is that the headless browser is crashing, but I'm not sure how to debug it, and why it could happen

When I run a crawler with simpe fetch I can crawl all 483 pages in the web site, but with crawl4ai it crashes after about a 50 to 100 pages, and just print a list of these errors

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

ubuntu (ec2 t2.large)

Python version

3.12.3

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

Metadata

Metadata

Assignees

Labels

⚙ DoneBug fix, enhancement, FR that's completed pending release🐞 BugSomething isn't working📌 Root causedidentified the root cause of bug

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions