Skip to content

Commit ce253e9

Browse files
committed
worked on package website
1 parent 6eaf956 commit ce253e9

File tree

6 files changed

+87
-836
lines changed

6 files changed

+87
-836
lines changed

NEWS.md

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,43 @@
1-
# What is New in *bulkreadr*?
2-
3-
## bulkreadr 1.1.0 (2023-11-13)
1+
# bulkreadr 1.1.0 (2023-11-13)
42

53
This update includes the following new features:
64

7-
- `generate_dictionary()`: This function is designed to automatically create a comprehensive data dictionary from labelled datasets. The generated dictionary provides detailed insights into each variable, aiding in better data understanding and management.
5+
* `generate_dictionary()`: This function is designed to automatically create a comprehensive data dictionary from labelled datasets. The generated dictionary provides detailed insights into each variable, aiding in better data understanding and management.
86

9-
- `look_for()`: This enhances the capability to efficiently search within labelled datasets. It allows users to quickly find variable names and their descriptions by searching for specific keywords. This feature streamlines data exploration and analysis, particularly in large datasets with extensive variables.
7+
* `look_for()`: This enhances the capability to efficiently search within labelled datasets. It allows users to quickly find variable names and their descriptions by searching for specific keywords. This feature streamlines data exploration and analysis, particularly in large datasets with extensive variables.
108

119
These enhancements aim to improve the user experience in data management and exploration within `bulkreadr`. We hope these new features will assist our users in more effectively navigating and understanding their labelled datasets.
1210

13-
## bulkreadr 1.0.0 (2023-09-20)
11+
# bulkreadr 1.0.0 (2023-09-20)
1412

1513
This update includes the following new features and improvements:
1614

17-
- Developed `read_stata_data()` to import Stata data file (`.dta`) into an R data frame, converting labeled variables into factors.
15+
* Developed `read_stata_data()` to import Stata data file (`.dta`) into an R data frame, converting labeled variables into factors.
1816

19-
- Reduced dependency packages to optimize efficiency.
17+
* Reduced dependency packages to optimize efficiency.
2018

2119

22-
## 0.2.0 (2023-09-11)
20+
# 0.2.0 (2023-09-11)
2321

2422
This update includes the following new features and improvements:
2523

26-
- Developed bulkreadr vignette
24+
* Developed bulkreadr vignette
2725

28-
- Developed `read_spss_data()` to seamlessly import data from an SPSS data (`.sav` or `.zsav`) files and converting labelled variables into factors, a crucial step that enhances the ease of data manipulation and analysis within the R programming environment.
26+
* Developed `read_spss_data()` to seamlessly import data from an SPSS data (`.sav` or `.zsav`) files and converting labelled variables into factors, a crucial step that enhances the ease of data manipulation and analysis within the R programming environment.
2927

30-
- Added more unit tests
28+
* Added more unit tests
3129

32-
## 0.1.0 (2023-07-24)
30+
# 0.1.0 (2023-07-24)
3331

3432
This update includes the following new features and improvements:
3533

36-
- Improved error handling by adding meaningful error messages for all functions within `bulkreadr` package. This will make it easier for users to identify and troubleshoot issues that may arise during their use of the package.
34+
* Improved error handling by adding meaningful error messages for all functions within `bulkreadr` package. This will make it easier for users to identify and troubleshoot issues that may arise during their use of the package.
3735

38-
- Added package-level documentation. The user can now use `?bulkreadr::bulkreadr` for basic package-level documentation.
36+
* Added package-level documentation. The user can now use `?bulkreadr::bulkreadr` for basic package-level documentation.
3937

40-
- Added `inspect_na()` to summarize missingness in data frame columns and `fill_missing_values()` to impute missing values in a dataframe.
38+
* Added `inspect_na()` to summarize missingness in data frame columns and `fill_missing_values()` to impute missing values in a dataframe.
4139

42-
## 0.0.0.9 (2023-07-03)
40+
# 0.0.0.9 (2023-07-03)
4341

4442
The development version of bulkreadr is now on Githhub.
4543

README.Rmd

Lines changed: 1 addition & 300 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ options(tibble.print_min = 5, tibble.print_max = 5)
2525
<!-- badges: start -->
2626
[![R-CMD-check](https://github.com/gbganalyst/bulkreadr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/gbganalyst/bulkreadr/actions/workflows/R-CMD-check.yaml)
2727
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version/bulkreadr)](https://cran.r-project.org/package=bulkreadr)
28+
[![metacran downloads](https://cranlogs.r-pkg.org/badges/bulkreadr)](https://cran.r-project.org/package=bulkreadr)
2829
[![metacran downloads](https://cranlogs.r-pkg.org/badges/grand-total/bulkreadr)](https://cran.r-project.org/package=bulkreadr)
2930
[![Codecov test coverage](https://codecov.io/gh/gbganalyst/bulkreadr/branch/main/graph/badge.svg)](https://app.codecov.io/gh/gbganalyst/bulkreadr?branch=main)
3031
<!-- badges: end -->
@@ -75,306 +76,6 @@ library(dplyr)
7576
```
7677

7778

78-
## Functions in bulkreadr package
79-
80-
This section provides a concise overview of the different functions available in the `bulkreadr` package. These functions serve various purposes and are designed to handle importing of data in bulk.
81-
82-
| Functions to Import Data | Other Functions |
83-
|--------------------------------------|-----------------------------------------|
84-
| [`read_excel_workbook()`](#read_excel_workbook) | [`generate_dictionary()`](#generate_dictionary) |
85-
| [`read_excel_files_from_dir()`](#read_csv_files_from_dir) | [`look_for()`](#look_for) |
86-
| [`read_csv_files_from_dir()`](#read_csv_files_from_dir) | [`pull_out()`](#pull_out) |
87-
| [`read_gsheets()`](#read_gsheets) | [`convert_to_date()`](#convert_to_date) |
88-
| [`read_spss_data()`](#read_spss_data) | [`inspect_na()`](#inspect_na) |
89-
| [`read_stata_data()`](#read_stata_data) | [`fill_missing_values()`](#fill_missing_values) |
90-
91-
92-
**Note:**
93-
94-
> For the majority of functions within this package, we will utilize data stored in the system file by the `bulkreadr`, which can be accessed using the `system.file()` function. If you wish to utilize your own data stored in your local directory, please ensure that you have set the appropriate file path prior to using any functions provided by the bulkreadr package.
95-
96-
97-
## `read_excel_workbook()`
98-
99-
`read_excel_workbook()` reads all the data from the sheets of an Excel workbook and return an appended dataframe.
100-
101-
```{r example1}
102-
103-
# path to the xls/xlsx file.
104-
105-
path <- system.file("extdata", "Diamonds.xlsx", package = "bulkreadr", mustWork = TRUE)
106-
107-
# read the sheets
108-
109-
read_excel_workbook(path = path)
110-
111-
```
112-
113-
## `read_excel_files_from_dir()`
114-
115-
`read_excel_files_from_dir()` reads all Excel workbooks in the `"~/data"` directory and returns an appended dataframe.
116-
117-
```{r example1a}
118-
119-
# path to the directory containing the xls/xlsx files.
120-
121-
directory <- system.file("xlsxfolder", package = "bulkreadr")
122-
123-
# import the workbooks
124-
125-
read_excel_files_from_dir(dir_path = directory)
126-
127-
```
128-
129-
## `read_csv_files_from_dir()`
130-
131-
`read_csv_files_from_dir()` reads all csv files from the `"~/data"` directory and returns an appended dataframe. The resulting dataframe will be in the same order as the CSV files in the directory.
132-
133-
```{r example2}
134-
# path to the directory containing the CSV files.
135-
136-
directory <- system.file("csvfolder", package = "bulkreadr")
137-
138-
# import the csv files
139-
140-
read_csv_files_from_dir(dir_path = directory)
141-
142-
```
143-
144-
## `read_gsheets()`
145-
146-
The `read_gsheets()` function imports data from multiple sheets in a Google Sheets spreadsheet and appends the resulting dataframes from each sheet together to create a single dataframe. This function is a powerful tool for data analysis, as it allows you to easily combine data from multiple sheets into a single dataset.
147-
148-
```{r, include=FALSE}
149-
googlesheets4::gs4_deauth()
150-
```
151-
152-
```{r example3}
153-
154-
# Google Sheet ID or the link to the sheet
155-
156-
sheet_id <- "1izO0mHu3L9AMySQUXGDn9GPs1n-VwGFSEoAKGhqVQh0"
157-
158-
# read all the sheets
159-
160-
read_gsheets(ss = sheet_id)
161-
```
162-
163-
## `read_spss_data()`
164-
165-
`read_spss_data()` is designed to seamlessly import data from an SPSS data (`.sav` or `.zsav`) files. It converts labelled variables into factors, a crucial step that enhances the ease of data manipulation and analysis within the R programming environment.
166-
167-
```{r spssdata1}
168-
169-
# Read an SPSS data file without converting variable labels as column names
170-
171-
file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")
172-
173-
data <- read_spss_data(file = file_path)
174-
175-
data
176-
177-
```
178-
179-
```{r spssdata2}
180-
181-
# Read an SPSS data file and convert variable labels as column names
182-
183-
data <- read_spss_data(file = file_path, label = TRUE)
184-
185-
data
186-
187-
```
188-
189-
## read_stata_data()
190-
191-
`read_stata_data()` reads Stata data file (`.dta`) into an R data frame, converting labeled variables into factors.
192-
193-
**Read the Stata data file without converting variable labels as column names**
194-
195-
```{r statadata1}
196-
197-
file_path <- system.file("extdata", "Wages.dta", package = "bulkreadr")
198-
199-
data <- read_stata_data(file = file_path)
200-
201-
data
202-
203-
```
204-
205-
**Read the Stata data file and convert variable labels as column names**
206-
207-
```{r statadata2}
208-
209-
data <- read_stata_data(file = file_path, label = TRUE)
210-
211-
data
212-
213-
```
214-
215-
216-
## `generate_dictionary()`
217-
218-
`generate_dictionary()` creates a data dictionary from a specified data frame. This function is particularly useful for understanding and documenting the structure of your dataset, similar to data dictionaries in Stata or SPSS.
219-
220-
```{r}
221-
222-
# Creating a data dictionary from an SPSS file
223-
224-
file_path <- system.file("extdata", "Wages.sav", package = "bulkreadr")
225-
226-
wage_data <- read_spss_data(file = file_path)
227-
228-
generate_dictionary(wage_data)
229-
```
230-
231-
232-
## `look_for()`
233-
234-
The `look_for()` function is designed to emulate the functionality of the Stata `lookfor` command in R. It provides a powerful tool for searching through large datasets, specifically targeting variable names, variable label descriptions, factor levels, and value labels. This function is handy for users working with extensive and complex datasets, enabling them to quickly and efficiently locate the variables of interest.
235-
236-
237-
```{r}
238-
239-
# Look for a single keyword.
240-
241-
look_for(wage_data, "south")
242-
```
243-
244-
245-
## `pull_out()`
246-
247-
`pull_out()` is similar to `[`. It acts on vectors, matrices, arrays and lists to extract or replace parts. It is pleasant to use with the magrittr (`⁠%>%`⁠) and base(`|>`) operators.
248-
249-
```{r example4}
250-
251-
top_10_richest_nig <- c("Aliko Dangote", "Mike Adenuga", "Femi Otedola", "Arthur Eze", "Abdulsamad Rabiu", "Cletus Ibeto", "Orji Uzor Kalu", "ABC Orjiakor", "Jimoh Ibrahim", "Tony Elumelu")
252-
253-
top_10_richest_nig %>%
254-
pull_out(c(1, 5, 2))
255-
```
256-
257-
```{r}
258-
top_10_richest_nig %>%
259-
pull_out(-c(1, 5, 2))
260-
```
261-
262-
263-
## `convert_to_date()`
264-
265-
`convert_to_date()` parses an input vector into POSIXct date-time object. It is also powerful to convert from excel date number like `42370` into date value like `2016-01-01`.
266-
267-
```{r example 5}
268-
269-
## ** heterogeneous dates **
270-
271-
dates <- c(
272-
44869, "22.09.2022", NA, "02/27/92", "01-19-2022",
273-
"13-01- 2022", "2023", "2023-2", 41750.2, 41751.99,
274-
"11 07 2023", "2023-4"
275-
)
276-
277-
# Convert to POSIXct or Date object
278-
279-
convert_to_date(dates)
280-
281-
# It can also convert date time object to date object
282-
283-
convert_to_date(lubridate::now())
284-
285-
```
286-
287-
288-
```{r example5}
289-
# With dataframe
290-
291-
file_path <- system.file("extdata", "OGD.xlsx", package = "bulkreadr")
292-
293-
ogd_data <- read_excel_workbook(path = file_path)
294-
295-
296-
ogd_data %>% head()
297-
298-
# Convert to POSIXct or Date object
299-
300-
modified_ogd_data <- ogd_data %>%
301-
mutate(Date_format = convert_to_date(Date))
302-
303-
modified_ogd_data %>% head()
304-
305-
```
306-
307-
308-
## `inspect_na()`
309-
310-
`inspect_na()` summarizes the rate of missingness in each column of a data frame. For a grouped data frame, the rate of missingness is summarized separately for each group.
311-
312-
```{r example 6a}
313-
314-
# dataframe summary
315-
316-
inspect_na(airquality)
317-
318-
# grouped dataframe summary
319-
320-
airquality %>%
321-
group_by(Month) %>%
322-
inspect_na()
323-
324-
```
325-
326-
## `fill_missing_values()`
327-
328-
`fill_missing_values()` in an efficient function that addresses missing values in a dataframe. It uses imputation by function, meaning it replaces missing data in numeric variables with either the mean or the median, and in non-numeric variables with the mode. The function takes a column-based imputation approach, ensuring that replacement values are derived from the respective columns, resulting in accurate and consistent data. This method enhances the integrity of the dataset and promotes sound decision-making and analysis in data processing workflows.
329-
330-
```{r example 6}
331-
332-
df <- tibble::tibble(
333-
Sepal_Length = c(5.2, 5, 5.7, NA, 6.2, 6.7, 5.5),
334-
Sepal.Width = c(4.1, 3.6, 3, 3, 2.9, 2.5, 2.4),
335-
Petal_Length = c(1.5, 1.4, 4.2, 1.4, NA, 5.8, 3.7),
336-
Petal_Width = c(NA, 0.2, 1.2, 0.2, 1.3, 1.8, NA),
337-
Species = c("setosa", NA, "versicolor", "setosa",
338-
NA, "virginica", "setosa"
339-
)
340-
)
341-
342-
df
343-
344-
# Using mean to fill missing values for numeric variables
345-
346-
result_df_mean <- fill_missing_values(df, use_mean = TRUE)
347-
348-
result_df_mean
349-
350-
# Using median to fill missing values for numeric variables
351-
352-
result_df_median <- fill_missing_values(df, use_mean = FALSE)
353-
354-
result_df_median
355-
```
356-
357-
### Impute missing values (NAs) in a grouped data frame
358-
359-
You can use the `fill_missing_values()` in a grouped data frame by using other grouping and map functions. Here is an example of how to do this:
360-
361-
```{r}
362-
sample_iris <- tibble::tibble(
363-
Sepal_Length = c(5.2, 5, 5.7, NA, 6.2, 6.7, 5.5),
364-
Petal_Length = c(1.5, 1.4, 4.2, 1.4, NA, 5.8, 3.7),
365-
Petal_Width = c(0.3, 0.2, 1.2, 0.2, 1.3, 1.8, NA),
366-
Species = c("setosa", "setosa", "versicolor", "setosa",
367-
"virginica", "virginica", "setosa")
368-
)
369-
370-
sample_iris
371-
372-
sample_iris %>%
373-
group_by(Species) %>%
374-
group_split() %>%
375-
map_df(fill_missing_values)
376-
```
377-
37879
## Context
37980

38081
bulkreadr draws on and complements / emulates other packages such as readxl, readr, and googlesheets4 to read bulk data in R.

0 commit comments

Comments
 (0)