Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] How to create an S3 vector (using {vctrs}) whose prototype is an xml_node object? #370

Closed
ramiromagno opened this issue Jul 31, 2022 · 2 comments

Comments

@ramiromagno
Copy link

ramiromagno commented Jul 31, 2022

I'd like to use an xml_nodeset object as a column in a data frame.

Here are my naive attempts:

library(xml2)
library(tibble)
library(dplyr)

x <- read_xml("<parent><child>1</child><child>2<child>3</child></child><manzz>1</manzz></parent>")
xml_nodeset_obj <- xml_children(x)
xml_nodeset_obj_unclassed <- unclass(xml_nodeset_obj)

# Won't work
tibble(xml_nodeset_obj)

tbl <- tibble(child = xml_nodeset_obj_unclassed)

tbl |>
  mutate(child = xml_find_all(y, xpath = './child'))

# Post tibble creation cheating won't work either :)
class(tbl$child) <- "xml_nodeset"

After reading one of the errors obtained with code above:

Error in `vec_size()`:
! `x` must be a vector, not a <xml_nodeset> object.
Run `rlang::last_error()` to see where the error occurred.

I understand now that it might be possible to make an xml_nodeset object by following the instructions provided here: S3 vectors, right?

Should I try to implement this myself in a package of my own, or is this functionality desirable in {xml2}?

My objective is to provide similar functionality to tidyjson but for XML data.

@ramiromagno ramiromagno changed the title [Feature] Allow an xml_nodeset object to become a column in a data frame [FR] Allow an xml_nodeset object to become a column in a data frame Jul 31, 2022
@ramiromagno
Copy link
Author

ramiromagno commented Aug 3, 2022

Would you be so kind to provide feedback on my approach here to make an S3 vector out of an xml_node type. Bear with me as I just read S3 vectors vignette, specifically the part on list-of types.

I am indicating the prototype as structure(logical(), class = 'xml_node') using logical() as dummy. Not sure how to do this given that {xml2} does not (?) provide a function to instantiate a xml_node object.

For some reason the tibble is not showing the elements of column x (see below).

The prefix XXX is a placeholder for an hypothetical R package where the class XXX_xml_node would be registered.

library(xml2)
library(tibble)
library(vctrs)
#> 
#> Attaching package: 'vctrs'
#> The following object is masked from 'package:tibble':
#> 
#>     data_frame

new_XXX_xml_node <- function(x) {
  vctrs::new_list_of(x,
                     ptype = structure(logical(), class = 'xml_node'),
                     class = "XXX_xml_node")
}

XXX_xml_node <- function(x) {
  new_XXX_xml_node(x)
}

vec_ptype_full.XXX_xml_node <- function(x, ...) "XXX_xml_node"
vec_ptype_abbr.XXX_xml_node <- function(x, ...) "xml_node"

as_XXX_xml_node <- function(x, ...) UseMethod("as_XXX_xml_node")

as_XXX_xml_node.xml_node <- function(x, ...) {
  XXX_xml_node(x)
}

as_XXX_xml_node.xml_nodeset <- function(x, ...) {
  XXX_xml_node(unclass(x))
}

as_XXX_xml_node.xml_document <- function(x, ...) {
  # Convert xml_document to a list of xml_node objects
  xx <- unclass(xml2::xml_children(x))
  XXX_xml_node(xx)
}

format.XXX_xml_node <- function(x, ...) {
  desc <-
    encodeString(vapply(x, as.character, FUN.VALUE = character(1)))
  paste0(substr(desc, 1, 20 - 3), "...")
}

obj_print_data.XXX_xml_node <- function(x, ...) {
  if (length(x) == 0)
    return()
  print(format(x), quote = FALSE)
}

# Example application
x <- read_xml("
                <parent>
                  <child>1</child>
                  <child>2</child>
                  <child>3</child>
                  <child>4</child>
                  <child>
                    <grandchildren>5.1</grandchildren>
                    <grandchildren>5.2</grandchildren>
                    <grandchildren>5.3</grandchildren>
                  </child>
                  <child>
                    <grandchildren>6.1</grandchildren>
                    <grandchildren>6.2</grandchildren>
                    <child>6.2</child>
                  </child>
                </parent>")

as_XXX_xml_node(x)
#> <XXX_xml_node[6]>
#> [1] <child>1</child>...   <child>2</child>...   <child>3</child>...  
#> [4] <child>4</child>...   <child>\\n  <grand... <child>\\n  <grand...

(tbl <- tibble::tibble(x = as_XXX_xml_node(x), i = seq_along(x)))
#> # A tibble: 6 × 2
#>            x     i
#>   <xml_node> <int>
#> 1                1
#> 2                2
#> 3                3
#> 4                4
#> 5                5
#> 6                6

@ramiromagno ramiromagno changed the title [FR] Allow an xml_nodeset object to become a column in a data frame [FR] How to create an S3 vector (using {vctrs}) whose prototype is a xml_node object? Aug 3, 2022
@ramiromagno ramiromagno changed the title [FR] How to create an S3 vector (using {vctrs}) whose prototype is a xml_node object? [FR] How to create an S3 vector (using {vctrs}) whose prototype is an xml_node object? Aug 3, 2022
@hadley
Copy link
Member

hadley commented Oct 30, 2023

Closing in favour of #377. I don't have vctrs loaded in my brain at the moment, so I can't offer any concrete feedback on what you tried, but I think we should just do this right in the package so that you and others don't need to worry about it.

@hadley hadley closed this as completed Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants