Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function to create meta information from scratch #22

Open
mbannert opened this issue Apr 10, 2020 · 3 comments
Open

function to create meta information from scratch #22

mbannert opened this issue Apr 10, 2020 · 3 comments

Comments

@mbannert
Copy link

Practical swissdata experience has shown that defining meta information in strings – no matter whether it is .json or .yaml is not very intuitive. For an R person the natural way to define a hierarchical structure is a list. Also because indent an code highlighting works so well as opposed to json in R Studio.

@christophsax
Copy link
Member

christophsax commented Apr 10, 2020

We could do dataset_create(data, meta), with the code `dataset_read():

  z <- list(
    meta = dots_to_underscore(empty_list_to_null(meta)),
    data = data,
    set_id = gsub(".", "_", set_id, fixed = TRUE)
  )
  names(z$meta) <- gsub("utc_updated", "updated_utc", names(z$meta), fixed = TRUE)

  class(z) <- "swissdata"

  if (test) ans <- dataset_validate(z)

And then perhaps another function, meta(), where each element gets its own argument? This would create the list that is supplied to dataset_create as the meta argument.

@mbannert
Copy link
Author

I think that's definitely going into the right direction. I like the idea to create swissdata objects that way and to validate them is perfect – would also make a perfect test.
That and a good vignette might already do the job.

I've been thinking about a potential meta() function and I am not sure whether that's rather a skeleton approach like in the original swissdata package or a ... type of function.

Consider to swissdatify one of these new daily datasets that are around and have been popular lately.

six <- fread("https://raw.githubusercontent.com/KOF-ch/economic-monitoring/master/data/ch.six.csv")
#> Error in fread("https://raw.githubusercontent.com/KOF-ch/economic-monitoring/master/data/ch.six.csv"): could not find function "fread"

metadata_six <- list(
  "title" = list(en = "SIX Debit and Credit Card Use"),
  "source.name"= list(en = "SIX"),
  "source.url" = "https://github.com/statistikZH/covid19monitoring_economy_SIX",
  dim.order = c("variable"),
  hierarchy = list(
    variable = list(
      "stat_einkauf" = NA,
      "bezug_bargeld" = NA,
      "stat_einkauf" = NA
    )
  ),
  labels = list(
    dim.names = list(
      variable = list(
        en = "variable"
      )
    ),
    debiteinsatz_ausland = list(
      en = "Volume Debitcard use abroad",
      de = " Finanzvolumen Debitkarteneinsatz im Ausland"
    ),
    bezug_bargeld = list(
      en = "Volume Cash Withdrawal Switzerland",
      de =" Finanzvolumen Bargeldbezug Debitkarten in der Schweiz"
    ),
    stat_einkauf = list(
      en = "Volume debit card use in retatil (w/o online)",
      de = "Finanzvolumen Debitkarteneinsatz stationärer Einkauf in der Schweiz (kein Online-Handel)"
    )
  ),
  details = list(
    en = "Die Daten von SIX Payment Services umfassen bargeldlose Transaktionen und Bargeldbezüge im In- und Ausland, für welche von Schweizer Banken ausgehändigte Debitkarten der folgenden Marken verwendet wurden: Debit Mastercard, Maestro CH, V PAY oder Visa Debit."
  ),
  utc.updated = Sys.time()
)

Created on 2020-04-11 by the reprex package (v0.3.0)

How could we make a function out of this? Maybe we make it multiple step process:

  1. create a long format dataset using a process like the one you suggested above.
  2. pass the new sd_data object to a meta() function which returns a list with the standard elements like title and other must haves + stuff derived from the data columns, maybe have dim order parameter.
  3. modify the list, the existing functions are likely good enough already.
  4. put sd_meta object + sd_data object together in a swissdata class.

Besides I like the idea to also think about I/O here. How about a swissdata class to .zip file function / option.

@christophsax
Copy link
Member

(I am using data and meta where you are using the prefixed version)

Yes, I like your second step: data defines the minimal structure for meta and fills it with placeholders, or ids instead of labels. It then need to be filled in. E.g., meta <- meta_minimal(data).

How to edit meta is a separate question. Either in R, by manipulating the list. Or by editing YAML or JSON. Or by a supercharged version of dput() for lists that prints meta like your R code above.

Which one may depend on the user need an so it is ok to leave that open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants