Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to audit 404 (not-found) html page with lighthouse #10493

Closed
ondrejsevcik opened this issue Mar 22, 2020 · 21 comments · Fixed by #15494
Closed

How to audit 404 (not-found) html page with lighthouse #10493

ondrejsevcik opened this issue Mar 22, 2020 · 21 comments · Fixed by #15494
Assignees
Labels

Comments

@ondrejsevcik
Copy link

ondrejsevcik commented Mar 22, 2020

Hi,

when I try to run audit on 404 page, the lighthouse complains that it was faulty request and it can't continue.

runtimeError:
      { code: 'ERRORED_DOCUMENT_REQUEST',
        message:
         'Lighthouse was unable to reliably load the page you requested. Make sure you are testing the correct URL and that the server is properly responding to all requests. (Status code: 404)' },

I wonder why is there this restriction? Is there a way to bypass this and actually test 404 HTML page?

Thank you

@devtools-bot
Copy link

Thanks! Appreciate you filing this bug. 👏

This is a known issue, most well described in #2784. So, we'll automatically close this as a duplicate.

However, if you believe your bug is different than the cases described there, please comment here with "necessarily-wide-alpaca" and I'll reopen this bug. 🤖 Beep beep boop.

@patrickhulce
Copy link
Collaborator

patrickhulce commented Mar 22, 2020

Thanks for filing @ondrejsevcik!

I wonder why is there this restriction?

We received a very high number number of complaints that Lighthouse is incorrect when the actual problem was that users were unwittingly auditing a 403/404/500 page instead of what they wanted to audit. An error status code almost always means to most users that whatever they were trying to audit isn't working correctly.

Is there a way to bypass this and actually test 404 HTML page?

You can't bypass directly within Lighthouse. To audit a 404 HTML page you'd need to serve the page with a 200 status code (either by creating such a route on your server, or through request interception before Lighthouse sees the response similar to #4376 (comment))

@ondrejsevcik
Copy link
Author

Thanks for the tip.

It would be useful to bypass this without writing a workaround code. There could be an argument like --ignore-not-found or similar.

@BennyAlex
Copy link
Contributor

@patrickhulce
Hey, any update on this. Having an flag for lighthouse to audit 404 pages would be really nice!

@patrickhulce
Copy link
Collaborator

In general closed issues don't get progress updates ;)

I'll reopen though. A CLI flag or config option is worth exploring.

@connorjclark
Copy link
Collaborator

It would be awkward to propagate these setting across PSI, the extension, and devtools, but this is likely what is required. @ondrejsevcik @BennyAlex are you running LH using one of those channels, or are you using the CLI directly? Would only adding this option for the CLI suffice?

@patrickhulce
Copy link
Collaborator

It would be awkward to propagate these setting across PSI, the extension, and devtools, but this is likely what is required.

I agree it would be awkward and not worth doing. Why would this be required though? CLI/node-only support similar to many of our special case scenarios already like network quiet detection failures and loadFailureMode shouldn't be blocked on having to introduce new controls in our other locked down channels. The config would be extremely barren if that were true :)

@connorjclark
Copy link
Collaborator

Required is the wrong word. I meant needed to solve the problem of the people in this issue, I'm assuming (and have asked) they might not be using the CLI.

@ondrejsevcik
Copy link
Author

In my use case, I'm using node. Although, it would be also possible to use CLI if this 404 support would be implemented in there.

@paulirish
Copy link
Member

(cc'ing in @Nooshu and @stewartmedia-nyoung from #11319)

While we understand folks want to test these sorts of pages... we are also thinking about cases where you're monitoring a given URL, which now starts 404ing.. Obviously the performance would change quite a bit, and there's a high change this would otherwise go unnoticed.

For users who are interested in this, how about taking the 404 page, and delivering it as a fake-404.html (which is served with a 200). A little more work, but it seems to satisfy all the constraints..

WDYT?

@tomasdev
Copy link

@paulirish I'd prefer a flag in dev tools to override the "is 404" check. I don't want to configure every temporary page I'm active developing on Angular into the back-end to respond with 200.

@Minishlink
Copy link

If you host your SPA website through Google Cloud Storage, you will have a 404 response for every page that is not the main page, thus making lighthouse audit impossible for every page other than the main page.

@patrickhulce
Copy link
Collaborator

If you host your SPA website through Google Cloud Storage, you will have a 404 response for every page that is not the main page, thus making lighthouse audit impossible for every page other than the main page.

FWIW, it's also impossible then for your users to ever come back to any page other than the main page, so this is not advisable.

@Minishlink
Copy link

If you host your SPA website through Google Cloud Storage, you will have a 404 response for every page that is not the main page, thus making lighthouse audit impossible for every page other than the main page.

FWIW, it's also impossible then for your users to ever come back to any page other than the main page, so this is not advisable.

I don't follow, can you explain what you mean please? I don't observe what you are describing so maybe we don't talk about the same thing?

@patrickhulce
Copy link
Collaborator

can you explain what you mean please?

Sure. You mentioned that if you host your site in such a way that navigating directly to a URL doesn't result in the page loading (and instead gives a 404) then Lighthouse can't measure it. I was saying if this is true, then a user can't visit it either.

I don't observe what you are describing so maybe we don't talk about the same thing?

Are you saying that your 404 error page is a client-side redirect to the underlying page (or a copy of your SPA)? If so, I see how we were talking past each other :) This still isn't advisable from the SEO side of things (or from the performance side of things if a redirect), but your point is taken that it's better served by an audit failure rather than a fatal error.

I'm inclined to agree this should be a toplevel warning rather than a fatal error, or at a minimum an optional flag to optout. Just a question of bandwidth of who is available to work on it.

@Pagan-Idel
Copy link

We are auditing over 200 pages automatically with node, we have 404 pages that returns a 404 and Lighthouse is also simply returning null on the audits. We would still like to monitor these. Any bandwidth/decision update on an "ignore-404" flag for node/CLI, @patrickhulce ?

@connorjclark
Copy link
Collaborator

I think we should add a flag to disable checking the status code for Node and the CLI.

@connorjclark
Copy link
Collaborator

  • we could have Node default to throwing an error on a bad status code, but introduce a flag to suppress that behavior. CDT would just ignore by default b/c it isn't automated.
  • OR: just make this a warning :)
  • Seems like LHCI should be the place for such a check/fatal error, if we had to have it

So, let's remove the fatal error and just have this be a warning.

@himanshuara
Copy link

an SPA hosted in S3 bucket governed by cloudfront always gives 404 response for every route. Lighthouse should have a provision for auditing 404 requests because they are not actually 404 the frontend app is governing the routes.

@paulirish
Copy link
Member

Late last year we added the ignoreStatusCode option that allows this. #15494

As of today, that option is enabled in PageSpeed Insights.

Example report of a 404 page: https://pagespeed.web.dev/analysis/http-www-example-com-dir/fqxhjfrjz3?form_factor=mobile

@Nooshu
Copy link

Nooshu commented Jun 6, 2024

Fantastic work everyone! Thanks for the CC @paulirish. I'll make sure I update my blogpost to mention this issue and that the functionality is now available! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.