html-validate #26

maelle · 2023-11-06T11:04:48Z

maelle · 2023-11-27T12:39:12Z

I fear the package is a thing to be started with Apache https://askubuntu.com/questions/471523/install-wc3-markup-validator-locally

maelle · 2023-11-27T13:44:04Z

based on that it seems to be that to use that package in a workflow some configuration files would need to be changed.

then one would need to serve both the website under scrutiny and the validator, then send the link to the website under scrutiny to the validator, then parse the results that would be a HTML file.

or maybe if one serves the validator, then there's an API.

I hope to find some better docs somewhere.

@krlmlr

maelle · 2023-11-27T13:52:20Z

https://validator.w3.org/docs/users.html#Installing

maelle · 2023-11-27T13:54:57Z

apparently any instance would have the API https://github.com/validator/validator/wiki/Service-%C2%BB-Input-%C2%BB-GET

maelle · 2023-11-27T14:34:48Z

I was hoping to find a ready-made action but didn't find one.

maelle · 2023-12-01T09:19:29Z

found https://www.npmjs.com/package/html-validator by chance (was working on some other invalid HTML 😂 )

maelle · 2023-12-01T09:19:58Z

but it would use the API

pat-s · 2023-12-18T19:54:49Z

What's wrong with https://github.com/cynkra/cynkraweb/blob/01998ff7e0574cb23ff4ca5f8bf27da6922d3e34/.github/workflows/s3-push.yaml#L110-L112?

krlmlr · 2023-12-18T20:04:02Z

The Quarto team doesn't agree that this validator is an authority, but they follow the w3c one.

quarto-dev/quarto-cli#7489

If the w3c validator is difficult to operate, we could also validate once with the w3c validator, and then come up with exclusions for our validator that lead to a green build.

To recap, why I think validation is important: I've heard that search engines treat well-formatted websites better than crappy ones. Happy to revisit this stance if it's irrelevant or wrong.

maelle · 2024-02-19T07:31:24Z

A first step would be to identify which pages are modified so as not to send the whole site to the API. 🤔

Probably not just a Git thing because a page's metadata might have changed (so different for Git) without it being worth sending it to the API.

Maybe a sitemap thing. Download the current sitemap, get the new one, send the new pages to the API.

krlmlr · 2024-02-19T07:41:44Z

To me, detecting changes is independent, and could also be postponed?

maelle · 2024-02-19T08:01:52Z

We need to know which pages to send the API.

maelle · 2024-02-19T08:02:16Z

Current script, something is wrong with how I send the document as it's not properly detected.

current_sitemap <- xml2::read_xml("https://blog.cynkra.com/sitemap.xml")
current_links <- xml2::xml_find_all(current_sitemap, ".//d1:loc") |>
  xml2::xml_text()

# quarto::quarto_render()

new_sitemap <-  xml2::read_xml(file.path("docs", "sitemap.xml"))
new_links <- xml2::xml_find_all(new_sitemap, ".//d1:loc") |>
  xml2::xml_text()
added_links <- setdiff(new_links, current_links)

validate_page <- function(url) {
  file <- file.path("docs", urltools::path(url))
  httr2::request("http://validator.w3.org/nu/?out=json") |>
    httr2::req_method("POST") |>
    httr2::req_headers(
      `Content-Type` = "text/html",
      "charset"="utf-8"
    ) |>
    httr2::req_body_file(file) |>
    httr2::req_perform() |>
    httr2::resp_body_json()
  
}

maelle · 2024-02-19T08:03:09Z

https://github.com/validator/validator/wiki/Service-%C2%BB-Input-%C2%BB-POST-body

maelle · 2024-02-19T08:05:14Z

ah, using httr2::curl_translate() helped

maelle · 2024-02-19T08:07:41Z

Still not there yet.

current_sitemap <- xml2::read_xml("https://blog.cynkra.com/sitemap.xml")
current_links <- xml2::xml_find_all(current_sitemap, ".//d1:loc") |>
  xml2::xml_text()

# quarto::quarto_render()

new_sitemap <-  xml2::read_xml(file.path("docs", "sitemap.xml"))
new_links <- xml2::xml_find_all(new_sitemap, ".//d1:loc") |>
  xml2::xml_text()
added_links <- setdiff(new_links, current_links)

validate_page <- function(url) {
  file <- file.path("docs", urltools::path(url))
  httr2::request("http://validator.w3.org/nu/") |> 
    httr2::req_url_query(out = "json") |>
    httr2::req_method("POST") |>
    httr2::req_headers(
      `Content-Type` = "text/html",
      "charset"="utf-8"
    ) |>
    httr2::req_body_raw(paste(brio::read_lines(file), collapse = "\n")) |>
    httr2::req_perform() |>
    httr2::resp_body_json()
  
}

validate_page(added_links[1])
#> $messages
#> $messages[[1]]
#> $messages[[1]]$type
#> [1] "error"
#> 
#> $messages[[1]]$message
#> [1] "The character encoding was not declared. Proceeding using “windows-1252”."
#> 
#> 
#> $messages[[2]]
#> $messages[[2]]$type
#> [1] "error"
#> 
#> $messages[[2]]$message
#> [1] "End of file seen without seeing a doctype first. Expected “<!DOCTYPE html>”."
#> 
#> 
#> $messages[[3]]
#> $messages[[3]]$type
#> [1] "error"
#> 
#> $messages[[3]]$message
#> [1] "Element “head” is missing a required instance of child element “title”."
#> 
#> 
#> $messages[[4]]
#> $messages[[4]]$type
#> [1] "info"
#> 
#> $messages[[4]]$subType
#> [1] "warning"
#> 
#> $messages[[4]]$message
#> [1] "Consider adding a “lang” attribute to the “html” start tag to declare the language of this document."

^{Created on 2024-02-19 with reprex v2.1.0}

maelle · 2024-02-19T08:10:00Z

The errors make no sense given the actual content of index.html, which means I am sending it in a wrong way.

maelle · 2024-02-19T08:11:26Z

Indeed, if I use showsource, it shows I sent nothing.

maelle · 2024-02-19T08:18:08Z

But the dry-run of httr2 shows content length.

maelle · 2024-02-19T08:19:02Z

I'm putting this aside for now. 😞

maelle · 2024-02-26T07:19:16Z

The last time https://github.com/validator/validator/wiki/Service-%C2%BB-Input-%C2%BB-POST-body was updated was in 2016, so maybe it's no longer valid?

maelle · 2024-02-26T07:33:04Z

I tried a bit more without success.

current_sitemap <- xml2::read_xml("https://blog.cynkra.com/sitemap.xml")
current_links <- xml2::xml_find_all(current_sitemap, ".//d1:loc") |>
  xml2::xml_text()

# quarto::quarto_render()

new_sitemap <-  xml2::read_xml(file.path("docs", "sitemap.xml"))
new_links <- xml2::xml_find_all(new_sitemap, ".//d1:loc") |>
  xml2::xml_text()
added_links <- setdiff(new_links, current_links)

validate_page <- function(url) {
  file <- file.path("docs", urltools::path(url))
  httr2::request("http://validator.w3.org/nu/") |>
    httr2::req_url_query(out = "json", showsource = "yes", parser = "html5") |>
    httr2::req_method("POST") |>
    httr2::req_headers(
      `Content-Type` = "text/html",
      "charset"="utf-8"
    ) |>
    httr2::req_body_raw(paste(brio::read_lines(file), collapse = "\n"), "text/html; charset=utf-8") |>
    httr2::req_perform() |>
    httr2::resp_body_json()

}

validate_page(added_links[1])
#> $messages
#> $messages[[1]]
#> $messages[[1]]$type
#> [1] "error"
#> 
#> $messages[[1]]$message
#> [1] "The character encoding was not declared. Proceeding using “windows-1252”."
#> 
#> 
#> $messages[[2]]
#> $messages[[2]]$type
#> [1] "error"
#> 
#> $messages[[2]]$message
#> [1] "End of file seen without seeing a doctype first. Expected “<!DOCTYPE html>”."
#> 
#> 
#> $messages[[3]]
#> $messages[[3]]$type
#> [1] "error"
#> 
#> $messages[[3]]$message
#> [1] "Element “head” is missing a required instance of child element “title”."
#> 
#> 
#> $messages[[4]]
#> $messages[[4]]$type
#> [1] "info"
#> 
#> $messages[[4]]$subType
#> [1] "warning"
#> 
#> $messages[[4]]$message
#> [1] "Consider adding a “lang” attribute to the “html” start tag to declare the language of this document."
#> 
#> 
#> 
#> $source
#> $source$type
#> [1] "text/html"
#> 
#> $source$code
#> [1] ""

^{Created on 2024-02-26 with reprex v2.1.0}

krlmlr · 2024-02-26T09:41:32Z

What text are you sending to the API?

maelle · 2024-02-26T11:31:17Z

a whole HTML file. httr2::req_dry_run() shows the content is not empty... but the API output states I sent nothing.

krlmlr · 2024-02-26T11:55:48Z

The file has <!DOCTYPE html> but the API doesn't see it?

maelle · 2024-02-26T14:10:49Z

the API sees "" apparently.

krlmlr · 2024-02-26T19:55:42Z

Can you upload a file manually to https://validator.w3.org/nu/about.html ?

I'm forgetting again why this is so complicated.

The site links to https://github.com/validator/validator
A bunch of different options to run it locally are listed there

What am I missing?

maelle · 2024-02-27T08:07:51Z

I wanted to use the API instead of trying to deploy the thing on GHA, but it's not working.

I had been able to use the web interface.

maelle · 2024-03-04T15:42:47Z

Installed the package from npm but was then unable to run it.

maelle · 2024-03-04T15:51:19Z

pfff it was actually easy, what was I thinking.

go the releases tab https://github.com/validator/validator/releases/tag/latest
from the latest one get " vnu.linux.zip"
place that somewhere (I put it in my local clone of cynkrablog and gitignored it)
run vnu-runtime-image/bin/vnu OPTIONS FILES

maelle · 2024-03-04T16:17:38Z

vnu-runtime-image/bin/vnu docs/index.html
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":42.1-42.174: error: Duplicate ID “quarto-text-highlighting-styles”.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":41.1-41.148: info warning: The first occurrence of ID “quarto-text-highlighting-styles” was here.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":46.1-46.161: error: Duplicate ID “quarto-bootstrap”.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":45.1-45.136: info warning: The first occurrence of ID “quarto-bootstrap” was here.

This is due to how Quarto handles dark mode. Both files are present in the source.

"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":106.3-106.100: info warning: The “type” attribute is unnecessary for JavaScript resources.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":108.1-108.31: info warning: The “type” attribute is unnecessary for JavaScript resources.

This is about <script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml-full.js" type="text/javascript"></script> and the lines below

"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":258.1-258.113: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":300.1-300.129: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":339.1-339.110: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":375.1-375.117: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":414.1-414.109: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":447.1-447.106: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":483.1-483.127: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":522.1-522.117: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":557.25-557.106: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":557.25-557.106: info: Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":592.1-592.110: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":628.1-628.109: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":663.25-663.160: info: Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":695.1-695.121: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":728.1-728.100: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":767.1-767.108: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":802.25-802.161: info: Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":837.1-837.100: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":875.25-875.106: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":875.25-875.106: info: Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":909.25-909.163: error: Element “img” is missing required attribute “src”.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":909.25-909.163: info: Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":943.25-943.111: error: Element “img” is missing required attribute “src”.
"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":943.25-943.111: error: An “img” element must have an “alt” attribute, except under certain conditions. For details, consult guidance on providing text alternatives for images.

quarto-dev/quarto-cli#6987
plus need for me to apply https://quarto.org/docs/websites/website-listings.html#listing-fields to current post

"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":943.25-943.111: info: Trailing slash on void elements has no effect and interacts badly with unquoted attribute values.

This is for lines such as <p class="card-img-top"><img data-src="mountain.jpg" style="height: 150px;" class="thumbnail-image card-img"/></p>

"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":987.1-987.66: info warning: The “type” attribute is unnecessary for JavaScript resources.

This refers to <script id="quarto-html-after-body" type="application/javascript">

"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":1131.54-1151.17: info warning: Document uses the Unicode Private Use Area(s), which should not be used in publicly exchanged documents. (Charmod C073)

"file:/home/maelle/Documents/cynkra/cynkrablog/docs/index.html":1527.1-1527.17: error: Element “script” must not have attribute “async” unless attribute “src” is also specified or unless attribute “type” is specified with value “module”.

Maybe <script async="">

maelle · 2024-03-04T16:18:13Z

@DivadNojnarg do the very last two lines of the comment above make sense to you? How could we tweak the script you created to avoid the validator error?

maelle · 2024-03-04T16:19:58Z

I'll come back to this issue next week, now that I can run the validator. 😸

maelle mentioned this issue Nov 6, 2023

add deploy workflow #23

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html-validate #26

html-validate #26

maelle commented Nov 6, 2023

maelle commented Nov 27, 2023 •

edited

Loading

maelle commented Nov 27, 2023

maelle commented Nov 27, 2023

maelle commented Nov 27, 2023

maelle commented Nov 27, 2023

maelle commented Dec 1, 2023

maelle commented Dec 1, 2023

pat-s commented Dec 18, 2023

krlmlr commented Dec 18, 2023

maelle commented Feb 19, 2024 •

edited

Loading

krlmlr commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 26, 2024

maelle commented Feb 26, 2024

krlmlr commented Feb 26, 2024

maelle commented Feb 26, 2024

krlmlr commented Feb 26, 2024

maelle commented Feb 26, 2024

krlmlr commented Feb 26, 2024

maelle commented Feb 27, 2024

maelle commented Mar 4, 2024

maelle commented Mar 4, 2024

maelle commented Mar 4, 2024

maelle commented Mar 4, 2024

maelle commented Mar 4, 2024

html-validate #26

html-validate #26

Comments

maelle commented Nov 6, 2023

maelle commented Nov 27, 2023 • edited Loading

maelle commented Nov 27, 2023

maelle commented Nov 27, 2023

maelle commented Nov 27, 2023

maelle commented Nov 27, 2023

maelle commented Dec 1, 2023

maelle commented Dec 1, 2023

pat-s commented Dec 18, 2023

krlmlr commented Dec 18, 2023

maelle commented Feb 19, 2024 • edited Loading

krlmlr commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 19, 2024

maelle commented Feb 26, 2024

maelle commented Feb 26, 2024

krlmlr commented Feb 26, 2024

maelle commented Feb 26, 2024

krlmlr commented Feb 26, 2024

maelle commented Feb 26, 2024

krlmlr commented Feb 26, 2024

maelle commented Feb 27, 2024

maelle commented Mar 4, 2024

maelle commented Mar 4, 2024

maelle commented Mar 4, 2024

maelle commented Mar 4, 2024

maelle commented Mar 4, 2024

maelle commented Nov 27, 2023 •

edited

Loading

maelle commented Feb 19, 2024 •

edited

Loading