Description
Thank you first of all for the development of this useful package.
Today, I have experienced a strange behavior from the read_html_live()
function, whereby if I run my script line by line from R Studio, and slowly, I can then use html_elements()
to retrieve the elements from the HTML page correctly, but if I source the script (or even if I run all the lines individually, but quickly!) html_elements()
just returns NAs, as if the contents of the variable returned by read_html_live()
are not yet available... (even if the variable is already stored in the global environment!)
Here is my minimal reproducible example where I retrieve 'F1000Research' best percentile from Scopus web site. I need scraping because such information is not provided by the API)
This just returns NAs:
journal_url <- "https://www.scopus.com/sourceid/21100258853"
page <- read_html_live(journal_url)
page |> html_elements("td:nth-child(1) div") |> html_text() -> category
best_category <- category[2]
page |> html_elements("td:nth-child(3) div div") |> html_text() -> percent
best_percentile <- percent[3]
cat("Category:", best_category, "\nPercentile:", best_percentile)
However this works (even when sourcing the entire script):
journal_url <- "https://www.scopus.com/sourceid/21100258853"
page <- read_html_live(journal_url)
Sys.sleep(1) # <----- just give him some time
page |> html_elements("td:nth-child(1) div") |> html_text() -> category
best_category <- category[2]
page |> html_elements("td:nth-child(3) div div") |> html_text() -> percent
best_percentile <- percent[3]
cat("Category:", best_category, "\nPercentile:", best_percentile)
¯(°_o)/¯
My sessionInfo:
> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default