-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Open
Labels
✨ EnhancementImprovement on an existing featureImprovement on an existing feature📌 Root causedidentified the root cause of bugidentified the root cause of bug
Description
crawl4ai version
v0.7.8
Expected Behavior
generate_schema should propose stable selectors that work across multiple pages with varying DOM positions (e.g., table rows shifting). Given several representative HTML samples, it should prefer attribute/text-anchored selectors over fragile nth-child positions.
Current Behavior
generate_schema inspects a single HTML sample and may emit brittle selectors like table.pdp-sku-info tbody tr:nth-child(6) td:nth-child(2) a. On other product pages, the same field appears in nth-child(5) or nth-child(7), so extraction fails until a human finds a more stable selector (e.g., a[href*="/product?Manufacturer="]).
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Any
Python version
3.10+
Browser
No response
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Metadata
Metadata
Assignees
Labels
✨ EnhancementImprovement on an existing featureImprovement on an existing feature📌 Root causedidentified the root cause of bugidentified the root cause of bug