Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could summary() report *which* columns are of a certain type as well? #356

Open
dan-reznik opened this issue Sep 11, 2018 · 3 comments
Open
Labels
add example to docs This discussion for this issue contains a useful example enhancement

Comments

@dan-reznik
Copy link

dan-reznik commented Sep 11, 2018

it would be great if skimr's summary() reported counts and the names of columns under each type category, for example:

skim(iris) %>% summary()

## A skim object    
## 
## Name: iris   
## Number of Rows: 150   
## Number of Columns: 5    
##     
## Column types    
## factor (N): fact_col_1, fact_col_2, ..., fact_col_N   
## dbl (M): dbl_col_1, dbl_col_2, ..., dbl_col_M
## int (...)
## lgl (...)
## chr (...)

this way we could quickly assess type distributions. idea: identify column types with the 3-letter symbols in the purrr::map_* family

@dan-reznik dan-reznik changed the title please make the summary() function report _which_ columns are categorical and numerical could summary() report _which_ columns are categorical and numerical Sep 11, 2018
@dan-reznik dan-reznik changed the title could summary() report _which_ columns are categorical and numerical could summary() report _which_ columns are of a certain type as well? Sep 11, 2018
@dan-reznik dan-reznik changed the title could summary() report _which_ columns are of a certain type as well? could summary() report *which* columns are of a certain type as well? Sep 11, 2018
@elinw
Copy link
Collaborator

elinw commented Sep 15, 2018

That's an interesting possibility, do you want to make a PR for it? I think I might want them in separate rows than the numbers, but I'm not sure. Keeping in mind that there could be 1000s of columns I might want to only display the first 5 (or something) names.

@elinw
Copy link
Collaborator

elinw commented Apr 1, 2019

In version 2 there is a single column that contains the variable names (skim_variable) and one with the type (skim_type) so it's easy to get the the of variables for a given type as a vector or to make a named list e.g my_list$numeric etc if you use to_list (which is like skim_to_list in v2). I think we should document this in a vignette. But going forward it might be part of thinking about managing skim with very large data sets.

@elinw elinw added the add example to docs This discussion for this issue contains a useful example label Apr 1, 2019
@elinw
Copy link
Collaborator

elinw commented Nov 11, 2019

An example in V2

skim(iris) %>% dplyr::filter(skim_type == "numeric") %>% dplyr::select(skim_variable)

# A tibble: 4 x 1
  skim_variable
  <chr>        
1 Sepal.Length 
2 Sepal.Width  
3 Petal.Length 
4 Petal.Width  

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
add example to docs This discussion for this issue contains a useful example enhancement
Projects
None yet
Development

No branches or pull requests

2 participants