Skip to content

Base_urls with trailing slash that redirect to same URL without trailing slash not getting scanned #177

@a11ya11y

Description

@a11ya11y

As an attempt to address issue #176 , I tried replacing the base_url https://info.health.nz/careers with https://info.health.nz/careers/, adding a trailing slash. It returns a 301 to https://info.health.nz/careers without the slash which CWAC should handle, but it ends up with zero pages scanned.

[{2025-10-14T15:40:49+1300} INFO    crawler.py : 69  ] iterate_through_base_urls Starting test https://info.health.nz/careers/ ThreadPoolExecutor-0_4
[{2025-10-14T15:40:49+1300} INFO    crawler.py : 482 ] crawl Starting crawl of https://info.health.nz/careers/ ThreadPoolExecutor-0_4
[{2025-10-14T15:40:59+1300} INFO    filters.py : 236 ] process_url_headers https://info.health.nz/careers/ has status code 200 ThreadPoolExecutor-0_4
[{2025-10-14T15:40:59+1300} INFO    crawler.py : 188 ] url_filter_prevent_intersections URL filtered out due to not starting with base_url https://info.health.nz/careers/ https://info.health.nz/ ThreadPoolExecutor-0_4
[{2025-10-14T15:40:59+1300} INFO    crawler.py : 597 ] crawl Crawl exhausted all links https://info.health.nz/careers/ ThreadPoolExecutor-0_4
[{2025-10-14T15:40:59+1300} WARNING  verify.py : 21  ] verify_axe_results VERIFY: https://info.health.nz/careers/ had 0 pages scanned, not 10 MainThread

Similar things are happening when trying to use:

https://www.mch.govt.nz/our-work/memorials-and-commemorations/oi-manawa-canterbury-earthquake-national-memorial/ with the slash which 301 redirects to
https://www.mch.govt.nz/our-work/memorials-and-commemorations/oi-manawa-canterbury-earthquake-national-memorial

and

https://www.mbie.govt.nz/business-and-employment/economic-growth/going-for-growth/
which 301 redirects to
https://www.mbie.govt.nz/business-and-employment/economic-growth/going-for-growth

And with if we use https://register.charities.govt.nz/, which 302 redirects to https://register.charities.govt.nz/CharitiesRegister/Search, neither the latter URL nor the 3 other pages on that site (/AdvancedSearch, /Account/LogOn, /PowerBI) are found.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions