Skip to content

read_html_live needs some time after returning its result to allow html_elements to work properly #428

Open
@Feat-FeAR

Description

@Feat-FeAR

Thank you first of all for the development of this useful package.
Today, I have experienced a strange behavior from the read_html_live() function, whereby if I run my script line by line from R Studio, and slowly, I can then use html_elements() to retrieve the elements from the HTML page correctly, but if I source the script (or even if I run all the lines individually, but quickly!) html_elements() just returns NAs, as if the contents of the variable returned by read_html_live() are not yet available... (even if the variable is already stored in the global environment!)

Here is my minimal reproducible example where I retrieve 'F1000Research' best percentile from Scopus web site. I need scraping because such information is not provided by the API)

This just returns NAs:

journal_url <- "https://www.scopus.com/sourceid/21100258853"
page <- read_html_live(journal_url)
page |> html_elements("td:nth-child(1) div") |> html_text() -> category
best_category <- category[2]
page |> html_elements("td:nth-child(3) div div") |> html_text() -> percent
best_percentile <- percent[3]
cat("Category:", best_category, "\nPercentile:", best_percentile)

However this works (even when sourcing the entire script):

journal_url <- "https://www.scopus.com/sourceid/21100258853"
page <- read_html_live(journal_url)

Sys.sleep(1) # <----- just give him some time

page |> html_elements("td:nth-child(1) div") |> html_text() -> category
best_category <- category[2]
page |> html_elements("td:nth-child(3) div div") |> html_text() -> percent
best_percentile <- percent[3]
cat("Category:", best_category, "\nPercentile:", best_percentile)

¯(°_o)/¯

My sessionInfo:

> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions