[Feature Request]: Remove GDPR Banners #1005
Replies: 6 comments
-
|
Yes this please... those damn cookie banners are really preventing 99% of websites to be crawled and it is really unusable like this. |
Beta Was this translation helpful? Give feedback.
-
|
I hope they release a update with this function. |
Beta Was this translation helpful? Give feedback.
-
|
Up! |
Beta Was this translation helpful? Give feedback.
-
|
up! |
Beta Was this translation helpful? Give feedback.
-
|
this! adds a lot of clutter to the markdown. |
Beta Was this translation helpful? Give feedback.
-
|
Hi everyone! Have you tried using the To enable this functionality, you can set the from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, BrowserConfig
async def crawl_with_overlay_removal(url: str):
config = CrawlerRunConfig(
remove_overlay_elements=True,
# Other configurations...
)
browser_config = BrowserConfig(headless=True)
async with AsyncWebCrawler(browser_config=browser_config) as crawler:
result = await crawler.crawl(url, config=config)
return result.htmlYou also can set the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What needs to be done?
Create a CrawlerRunConfig option/feature to reliably remove GDPR / Cookie-Consent popups and banners from crawled sites.
What problem does this solve?
With these banners/popups present the resulting markdown will often contain content of the cookie popup and not the actual site content.
Target users/beneficiaries
EU users
Current alternatives/workarounds
I've tried every trick under the sun to remove these obnoxious popups. Injecting JS-Code, loading a chrome/firefox extension, remove_overlay_elements ... nothing has worked so far. Until this fix is implemented, I might have to switch to playwright and use some other parser for the markdown.
Proposed approach
I don't know. You guys are the wizards.
Beta Was this translation helpful? Give feedback.
All reactions