URL query parameters are lowercased in result.links, breaking case-sensitive URLs #1684
Unanswered
nanwio
asked this question in
Forums - Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description
When crawl4ai extracts links from a page, the query parameters in URLs are lowercased in
result.links. This breaks websites where query parameters are case-sensitive.Steps to Reproduce
Expected Behavior
The original HTML contains:
<a href="/page?documentId=123&viewMode=Full">Link</a>result.links should preserve the original case:
https://www.example.com/page?documentId=123&viewMode=FullActual Behavior
result.links returns lowercased query parameters:
https://www.example.com/page?documentid=123&viewmode=fullImpact
Many websites have case-sensitive query parameters. When crawling child URLs with lowercased parameters, the server may return different content or empty pages.
Environment
Suggested Fix
Preserve the original case of query parameters when extracting links, or add a configuration option to disable URL normalization.
Beta Was this translation helpful? Give feedback.
All reactions