[Bug]: Crawler wont extract table content

### crawl4ai version

0.6.3

### Expected Behavior

Crawler should get table content given the LLM request

### Current Behavior

Extracts high level Markdown but not the table content that is specified

### Is this reproducible?

Yes

### Inputs Causing the Bug

```bash

```

### Steps to Reproduce

```bash
Despite a lot of attempts, changing the way to extract the content, including using schemas, I was not able to get the crawler to get the right result.
```

### Code snippets

```python
I have used this code:


async def get_charger_status_llm(charger_url: str) -> str:
    import os
    os.environ["OPENAI_API_KEY"] = settings.LLM_API_KEY
    # 1. Define the LLM extraction strategy
    llm_strategy = LLMExtractionStrategy(
        llm_config = LLMConfig(provider="openai/gpt-4o", api_token=os.getenv('OPENAI_API_KEY')),
        #schema=ChargerStatusDetails.model_json_schema(), # Or use model_json_schema()
        #extraction_type="schema",
        extraction_type="block",
        instruction="Look for the OCPP Log contained in a table  that has this HTML '<div class='tab ocpp-log relationship-tab' label='OCPP log'>' and extract the most recent (the one on the top) Status Notification COMMAND, which is under the OCPP Log section, and its details, including Status, errorCode, vendorErrorCode, and the overall station status.",
        chunk_token_threshold=4096,
        apply_chunking=True,
        input_format="markdown",   # or "html", "fit_markdown"
    )

    # 2. Build the crawler config
    crawl_config = CrawlerRunConfig(
        extraction_strategy=llm_strategy,
        cache_mode=CacheMode.BYPASS
    )

    # 3. Create a browser config if needed
    browser_config = BrowserConfig(
        headless=settings.HEADLESS_BROWSER,
        verbose=True,
        extra_args=[
            "--disable-gpu",
            "--no-sandbox",
            "--disable-dev-shm-usage",
            "--disable-setuid-sandbox",
            "--disable-images",
            "--disable-fonts",
        ],
        storage_state=settings.COOKIES_FILE_PATH,
        java_script_enabled=True,
    )

    async with AsyncWebCrawler(config=browser_config) as crawler:
        # 4. Let's say we want to crawl a single page
        result = await crawler.arun(
            url=charger_url,
            config=crawl_config
        )

        if result.success:
            # 5. The extracted content is presumably JSON
            data = json.loads(result.extracted_content)
            logger.debug(f"Extracted items: {data}")

            # 6. Show usage stats
            llm_strategy.show_usage()  # prints token usage
            return data
        else:
            logger.error("Error:", result.error_message)
```

### OS

macOS

### Python version

3.11

### Browser

_No response_

### Browser version

_No response_

### Error logs & Screenshots (if applicable)

[html_ampeco.html.txt](https://github.com/user-attachments/files/21112984/html_ampeco.html.txt)

[ampeco_ocpp_log_table.html.txt](https://github.com/user-attachments/files/21112988/ampeco_ocpp_log_table.html.txt)




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Crawler wont extract table content #1278

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]: Crawler wont extract table content #1278

Description

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions