[Bug]: Trying to run crawl4ai on https://platform.openai.com/docs outputs nothing #972

aliak00 · 2025-04-10T23:01:41Z

crawl4ai version

0.5.0.post8

Expected Behavior

Hi! I was running crawl4ai on a few things, and I happened to try https://platform.openai.com/docs and I would expect to be able to get the markdown.

Current Behavior

Empty response.

❯ .venv/bin/crwl "https://platform.openai.com/docs/overview" -o markdown



~/dev/quillia/apps/backend/products/web-crawler-py main* 
❯

Is this reproducible?

Yes

Inputs Causing the Bug

Just defaults, I don't think I have any configuration for anything:


.venv/bin/crwl "https://platform.openai.com/docs/overview" -o markdown

Steps to Reproduce

Run:

.venv/bin/crwl "https://platform.openai.com/docs/overview" -o markdown

Code snippets

OS

macOS

Python version

3.13

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

The text was updated successfully, but these errors were encountered:

ntohidi · 2025-04-11T09:30:22Z

Hi @aliak00
OpenAI has implemented very strong bot detection, so we need some other approaches to bypass it. One idea is to use an identity-based browser setup, essentially, create a browser instance, log in to the page, and save the user data profile directory. Then, launch a new browser instance that uses that same profile to preserve your identity.

The CLI doesn’t support that kind of setup at the moment.

ntohidi · 2025-04-11T09:31:12Z

@aliak00 I will close the issue, but feel free to continue asking questions if you have any.

aliak00 · 2025-04-11T10:04:42Z

Aha! Ok yeah that makes sense. Thank you :)

Follow up question - is there a way to just be told that. ... "yeah this website is not crawl able because of their own security measures or something" ?

ntohidi · 2025-04-11T13:53:18Z

Aha! Ok yeah that makes sense. Thank you :)

Follow up question - is there a way to just be told that. ... "yeah this website is not crawl able because of their own security measures or something" ?

Nice idea! You can now check result.success, and if it’s false, that means something went wrong, though it won’t explicitly say it’s due to bot detection. Still, it’s a good starting point.

@unclecode @aravindkarnam , we can add this to the backlog.

aliak00 added 🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers labels Apr 10, 2025

ntohidi closed this as completed Apr 11, 2025

ntohidi added ❓ Question Q&A and removed 🐞 Bug Something isn't working 🩺 Needs Triage Needs attention of maintainers labels Apr 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Trying to run crawl4ai on https://platform.openai.com/docs outputs nothing #972

[Bug]: Trying to run crawl4ai on https://platform.openai.com/docs outputs nothing #972

aliak00 commented Apr 10, 2025

ntohidi commented Apr 11, 2025

ntohidi commented Apr 11, 2025

aliak00 commented Apr 11, 2025

ntohidi commented Apr 11, 2025

[Bug]: Trying to run crawl4ai on https://platform.openai.com/docs outputs nothing #972

[Bug]: Trying to run crawl4ai on https://platform.openai.com/docs outputs nothing #972

Comments

aliak00 commented Apr 10, 2025

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

ntohidi commented Apr 11, 2025

ntohidi commented Apr 11, 2025

aliak00 commented Apr 11, 2025

ntohidi commented Apr 11, 2025