You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
xml2::read_html(x) returns the HTML within a linked data JSON object as expected:
library(xml2)
library(magrittr)
library(rvest)
test_ld <- '<script type="application/ld+json">{"@context":"http://schema.org","@type":"ReproducibleExample", "description":"<p><strong>text within tags</strong>text after closing tag</p>"'
# tags preserved
test_ld %>%
read_html() %>%
html_node('script[type="application/ld+json"]') %>%
as.character()
[1] "<script type=\"application/ld+json\">{\"@context\":\"http://schema.org\",\"@type\":\"ReproducibleExample\", \"description\":\"<p><strong>text within tags</strong>text after closing tag</p>\"</script>"
Where description contains the HTML <p><strong>text within tags</strong>text after closing tag</p>
But if using xml2::read_html(x, options = 'HUGE') or with any single option (I've tested 5 or 6), the closing tags are removed from the HTML text in a JSON-LD object.
description now becomes <p><strong>text within tagstext after closing tag
Setting options is necessary for some of the HTML I'm parsing. Is it possible to use options and preserve properly formatted HTML from a linked data object?
The text was updated successfully, but these errors were encountered:
description is as it should be <p><strong>text within tags</strong>text after closing tag</p>
sbha
changed the title
xml2 read_html removes closing tags from JSON-LD when using options
xml2 read_html removes closing tags from JSON-LD when using a single option
Oct 3, 2022
I'm not sure there's much we can do here, but leaving open because I have some suspicions that something is going wrong with the way we pass the options from R to C.
xml2::read_html(x)
returns the HTML within a linked data JSON object as expected:Where description contains the HTML
<p><strong>text within tags</strong>text after closing tag</p>
But if using
xml2::read_html(x, options = 'HUGE')
or with any single option (I've tested 5 or 6), the closing tags are removed from the HTML text in a JSON-LD object.description now becomes
<p><strong>text within tagstext after closing tag
Setting options is necessary for some of the HTML I'm parsing. Is it possible to use options and preserve properly formatted HTML from a linked data object?
The text was updated successfully, but these errors were encountered: