Skip to content

Commit 3af4539

Browse files
author
Markus Kainu
committed
New vignette
1 parent 0ae774b commit 3af4539

File tree

5 files changed

+192
-2
lines changed

5 files changed

+192
-2
lines changed

DESCRIPTION

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@ Imports:
1818
ckanr,
1919
DBI,
2020
jsonlite,
21-
glue
21+
glue,
22+
readr
2223
Encoding: UTF-8
2324
LazyData: true
2425
RoxygenNote: 7.3.2

R/read_data.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ get_data <- function(data_id, sql = NA){
3636

3737
#' Download the whole data set using csv-files
3838
#'
39+
#' This is useful when your system has issues either with Arrow or Duckdb,
40+
#' or you only have internet access to original source avoindata.fi
41+
#'
3942
#' @param data_id data id
4043
#'
4144
#' @return tibble

_pkgdown.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,15 @@ navbar:
44
href: https://ropengov.org
55
- icon: fab fa-github
66
href: https://github.com/rOpenGov/kelaopendata
7+
left:
8+
- text: Reference
9+
href: reference/index.html
10+
- text: Articles
11+
menu:
12+
- text: Fetching data using kelaopendata
13+
href: articles/read_data.html
14+
- text: Read dataset from original csv files using kelaopendata
15+
href: articles/read_data_csv.html
716
url: https://ropengov.github.io/kelaopendata/
817
template:
918
package: rogtemplate

man/get_data_csv.Rd

Lines changed: 2 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/read_data_csv.Rmd

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
---
2+
title: "Read dataset from original csv files using kelaopendata"
3+
author: "Markus Kainu"
4+
date: "`r Sys.Date()`"
5+
output: rmarkdown::html_vignette
6+
vignette: >
7+
%\VignetteIndexEntry{Geocoding with R and geofi}
8+
%\VignetteEncoding{UTF-8}
9+
%\VignetteEngine{knitr::rmarkdown}
10+
editor_options:
11+
chunk_output_type: console
12+
---
13+
14+
```{r setup, include = FALSE}
15+
knitr::opts_chunk$set(
16+
collapse = TRUE,
17+
comment = "#>",
18+
message = FALSE,
19+
warning = FALSE,
20+
fig.height = 7,
21+
fig.width = 7,
22+
dpi = 75
23+
)
24+
```
25+
26+
27+
**Installation**
28+
29+
`kelaopendata` can be installed from Github using
30+
31+
```{r, eval = FALSE}
32+
# Install development version from GitHub
33+
remotes::install_github("ropengov/kelaopendata")
34+
```
35+
36+
```{r}
37+
# Let's first create a function that checks if the suggested
38+
# packages are available
39+
check_namespaces <- function(pkgs){
40+
return(all(unlist(sapply(pkgs, requireNamespace,quietly = TRUE))))
41+
}
42+
```
43+
44+
In this vignette we show an alternative way for downloading dataset from origin csv files available at <https://www.avoindata.fi/data/fi/organization/kela> using `get_data_csv()`-function. This method does not allow sql-operations on the fly, as does `get_data()`-function, so you have to download the whole dataset. However, if you have issues with Apache Arrow or Duckdb, or you only have internet access to avoindata.fi, this is an alternative, though slower and requires more memory.
45+
46+
We will implement the same task as in [Fetching data using kelaopendata](https://ropengov.github.io/kelaopendata/articles/read_data.html)-vignette.
47+
48+
49+
## List available datasets
50+
51+
```{r}
52+
library(kelaopendata)
53+
library(dplyr)
54+
55+
dsets <- list_datasets()
56+
print(dsets, n = 50)
57+
```
58+
59+
## Obtaining data on Financial aid for students (opintotuki)
60+
61+
### Metadata
62+
63+
For this example we choose "Financial aid for students" as our benefit of interest. First we download the metadata and print the description field
64+
65+
```{r}
66+
d_id <- dsets[dsets$name == "opintotuen-saajat-ja-maksetut-tuet", ]$id
67+
meta <- get_metadata(data_id = d_id)
68+
meta$description
69+
```
70+
71+
And then a more technical overview of the content of data set, containing names of the csv-files, csv-dialect and values and types of each indicator in the data.
72+
73+
```{r}
74+
jsonlite::toJSON(meta$resources, pretty = T)
75+
```
76+
77+
A more dense view of variables and their types and descriptions can be printed with
78+
79+
```{r}
80+
meta$resources$schema$fields[[1]] |>
81+
select(-values) |>
82+
as_tibble()
83+
```
84+
85+
86+
87+
88+
### Quering, downloading and plotting the data
89+
90+
Let's query data on recipients of Student loan in the city of Turku using `kelaopendata::get_data()`-function
91+
92+
```{r}
93+
d_raw <- kelaopendata::get_data_csv(data_id = d_id)
94+
d_opintotuki <- d_raw %>%
95+
filter(etuus == 'Opintolainan valtiontakaus',
96+
aikatyyppi == 'Vuosi',
97+
kunta_nimi == 'Turku',
98+
etuus == 'Opintolainan valtiontakaus',
99+
oppilaitos_peruste == 'Viimeisin oppilaitos')
100+
```
101+
102+
Next, let's filter the data locally in R a bit more.
103+
104+
```{r}
105+
d_plot <- d_opintotuki %>%
106+
# Exclude
107+
filter(sukupuoli != "Tuntematon",!oppilaitosaste %in% c("Tieto puuttuu", "Yhteensä")) %>%
108+
mutate(oppilaitosaste = factor(
109+
oppilaitosaste,
110+
levels = c(
111+
"Yliopistot",
112+
"Ammattikorkeakoulut",
113+
"Ammatilliset oppilaitokset",
114+
"Lukiot",
115+
"Muut oppilaitokset",
116+
"Ulkomaiset oppilaitokset"
117+
)
118+
))
119+
```
120+
121+
122+
Finally, let's draw a plot on recipients by gender and type of institution
123+
124+
125+
```{r, fig.width=8, fig.height=12}
126+
library(ggplot2)
127+
ggplot(d_plot, aes(x = vuosi, y = saaja_lkm, fill = ikaryhma)) +
128+
geom_col(position = position_stack()) +
129+
facet_grid(oppilaitosaste ~ sukupuoli) +
130+
labs(title = "Recipients of government guarantee for a student loan in\nthe city of Turku in 2004 to 2024 by gender and type of institution") +
131+
theme_light()
132+
```
133+
134+
135+
136+
137+
```{r}
138+
d_opintotuki <- d_raw %>%
139+
filter(etuus == 'Opintolainan valtiontakaus',
140+
aikatyyppi == 'Vuosi',
141+
vuosi == 2023,
142+
etuus == 'Opintolainan valtiontakaus',
143+
oppilaitos_peruste == 'Viimeisin oppilaitos')
144+
d_opintotuki
145+
146+
library(geofi)
147+
muni <- get_municipalities()
148+
```
149+
150+
Next, let's filter the data locally in R a bit more.
151+
152+
```{r}
153+
d_plot <- d_opintotuki %>% #count(sukupuoli)
154+
# Exclude
155+
filter(sukupuoli != "Tuntematon",!oppilaitosaste %in% c("Tieto puuttuu", "Yhteensä")) %>%
156+
filter(oppilaitosaste == "Yliopistot",
157+
!is.na(sukupuoli)) %>%
158+
group_by(kunta_nro,sukupuoli) %>%
159+
summarise(saaja_lkm = sum(saaja_lkm)) %>%
160+
ungroup() %>%
161+
mutate(municipality_code = as.integer(kunta_nro))
162+
d_plot_sf <- left_join(muni,d_plot)
163+
```
164+
165+
166+
Finally, let's draw a plot on recipients by gender and type of institution
167+
168+
169+
```{r, fig.width=8, fig.height=12}
170+
library(ggplot2)
171+
ggplot(d_plot_sf, aes(fill = saaja_lkm)) +
172+
geom_sf() +
173+
facet_wrap(~ sukupuoli) +
174+
theme_light()
175+
```
176+

0 commit comments

Comments
 (0)