`read_html()` doesn't report parsing failure on very very long lines

``` r
library(xml2)

path <- tempfile()

long <- paste0("start", strrep("x", 12e6), "end")
nchar(long)
#> [1] 12000008

cat(
  "<html><body>\n<script type=\"application/json\">",
  long,
  "</script>\n</body></html>\n",
  file = path,
  sep = ""
)

html <- read_html(path)
xml <- read_xml(path)
#> Warning in read_xml.character(path): xmlSAX2Characters: huge text nod [2]
#> Error in read_xml.character(path): Extra content at the end of the document [5]
```

<sup>Created on 2024-02-27 with [reprex v2.1.0](https://reprex.tidyverse.org)</sup>

From https://github.com/tidyverse/rvest/issues/399

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`read_html()` doesn't report parsing failure on very very long lines #440

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

read_html() doesn't report parsing failure on very very long lines #440

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`read_html()` doesn't report parsing failure on very very long lines #440