-
Notifications
You must be signed in to change notification settings - Fork 15
/
proj.qmd
137 lines (85 loc) · 8.71 KB
/
proj.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Projects"
date-modified: 'today'
date-format: long
license: CC BY-NC
bibliography: references.bib
---
## RStudio Projects[^1]
[^1]: <https://support.posit.co/hc/en-us/articles/200526207-Using-Projects>
::: callout-note
## R and RStudio are not the same thing --- they go together
**R** is a coding language interpreter. **RStudio** is an *Integrated Development Environment* (IDE) used to make it easier to interact with the coding language.
:::
The projects feature of R helps keep projects and ideas discrete and establish a [reproducible](proj.html#reproducibility) workflow. It is easier to share code across installations with projects, such as using relative file paths or other coding hygiene practices.
### How to...
::: column-margin
[![RStudio Projects](images/projects.jpg){fig-alt="RStudio Projects"}](images/projects.jpg)
:::
1. Look in the upper-right corner of your RStudio IDE. Or, `File > New Projects`
2. Choose between **New**, Existing, or Version Control.
3. If you choose a New project, there are several options for project types. Personally, I like starting with **Quarto Website**. However **Quarto** **Project**[^2] is a more generic setup. You can also choose *Quarto Blog* or *Quarto Book*.
[^2]: Read more about each of the special subtypes of RStudio projects. [https://quarto.org/docs/projects/quarto-projects.html](https://quarto.org/docs/projects/quarto-projects.htmlhttps://quarto.org/docs/projects/quarto-projects.html)
::: {layout-ncol="2"}
[![New Quarto project](images/quarto_project.jpg){fig-alt="New Quarto project" width="250"}](images/quarto_project.jpg)
[![New Quarto Document](images/new_quarto_document.jpg){fig-alt="New Quarto Document" width="225"}](images/new_quarto_document.jpg)
:::
::: column-margin
```{=html}
<iframe width="300" height="200" src="https://www.youtube.com/embed/w_xCkbf7iYw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
```
:::
::: callout-tip
## Switch between projects
By clicking the upper-right hand corner of your project name, you'll be able to easily switch between projects.
:::
## Settings in RStudio preferences
Further recommendations on reproducibility suggest setting the following global preferences.
::: {layout-ncol="2"}
[![Global options \> General](images/prefers_global_options_general.jpg){fig-alt="Global options > General" width="200"}](images/prefers_global_options_general.jpg)
[![Global options \> Code](images/prefers_global_options_pipe.jpg){fig-alt="Global options > Code" width="300"}](images/prefers_global_options_pipe.jpg)
:::
::: column-margin
[![Tools \> Global options](images/prefers_global_options.jpg){fig-alt="Tools > Global options" width="200"}](images/prefers_global_options.jpg)
:::
- **Uncheck**
- Restore most recently opened project (uncheck)
- Restore previously open source document (uncheck)
- Restore .Rdata (uncheck)
- Save workspace to .RData to **NEVER**
- Always save history (uncheck)
Read more about the [blank slate approach to reproducibility](https://docs.posit.co/ide/user/ide/get-started/#blank-slate) using the RStudio IDE.
## Reproducibility
A reproducible approach to computing ensures that all documents are readily available and facilitates the execution and re-execution of computations to achieve identical results.[^3]
[^3]: <https://en.wikipedia.org/wiki/Reproducibility>
::: column-margin
[![Reproducibility pyramid (Little and Lafferty-Hess, 2023)](images/Little_reproducibility_project_pyramid.svg){fig-alt="Reproducibility pyramid"}](images/Little_reproducibility_project_pyramid.svg)
:::
> Two of the most basic principles in reproducibility are...
>
> 1. **Do everything with code**
> 2. **Render documents from code**
Developing computation techniques to honor reproducibility principles means, whenever possible, reduce or eliminate mouse-clicks and hard-to-document copy/paste steps. We do this because such actions are barriers for ourselves and others when it comes to re-executing code.
Using a coding language such as R, along with a report rendering scheme such as Quarto, is an ideal way to ensure the reproducibility of your analysis and publications. Reproducibility is further enhanced by leveraging literate coding [@knuth1984], tidy data [@wickham2014], and tidyverse principles [@wickham2019].
Building computation upon this foundations increases the chances of archiving your work for posterity. When we look back at the computational workflows of the nineteen seventies through the early two-thousands, we see the natural problems and evolution of solutions that now lead us to good reproducible technique.
## Archiving dependencies with {renv}
"The `renv` package helps create **r**eproducible **env**ironments for your R projects." {[renv](https://rstudio.github.io/renv)}[^4]
[^4]: See {renv} at [https://rstudio.github.io/renv](https://rstudio.github.io/renvhttps://rstudio.github.io/renv). See Also a discussion [virtual environments at this Quarto page](https://quarto.org/docs/projects/virtual-environments.html), including details on venv, conda, and renv.
> {`renv`} is a new effort to bring project-local R dependency management to your projects. The goal is for `renv` to be a robust, stable replacement for the {`packrat`} package, with fewer surprises and better default behaviors.[^5]
[^5]: [*Introduction to renv*](https://rstudio.github.io/renv/articles/renv.html) / Kevin Ushey
## Version Control and Code Repositories:
Tools for reproducibility, collaboration, documentation, and backup
**Version control**[^6] is the practice of tracking and managing changes to files and projects over time. **Code repositories**[^7] are platforms for storing, managing, and sharing code. Researchers can use these tools to improve the reproducibility, collaboration, documentation, and backup of their research.
[^6]: An example of a version control application is [git](https://git-scm.com/downloads). While git has a notoriously curmudgeonly user-interface, RStudio integrates with git and improves upon the interface. This makes version control especially useful in a scholarly-publishing/computational-thinking context.
[^7]: For example: [GitHub](https://github.com) and GitLab ([public site](https://gitlab.com) \| [Duke University Instance](https://gitlab.oit.duke.edu)) are examples of web-based code-repositories.
For example, version control can be used to track changes to a computational model, allowing researchers to easily revert to previous versions of the project if necessary. Code repositories can be used to share code with collaborators, making it easier to work together on projects.
Version control and code repositories can also be used to document the development process, which can help future researchers who want to build on the work. Additionally, code repositories can be used to create backups of code and data, which can help protect against data loss.
For tips on using the {usethis} package to simplify git (plus either GitHub or GitLab) with R, please see this [quick-start page on Version-control and reproducibility.](https://rfun.library.duke.edu/git)
## Research artifacts
Storing, sharing, and citing your work.
Archival data repositories[^8] are different from code repositories in that they aim to preserve research milestones as an artifact. This can include a published article, the code and data used to produce the article, and any other relevant materials.
[^8]: For example: [Zenodo.org](https://Zenodo,org/) is a free and open-source platform for sharing research data and code.
Meanwhile, Code repositories (mentioned above), are more focused on current workflows, and they are often used to store and share code that is used in research projects.
Linking code and data repositories can be helpful for researchers who want to ensure the reproducibility and posterity of their work. By linking the two repositories, researchers can make it easy for others to find and access the code and data that they used to produce their results.
For example, Referencing and citing your reproducible artifacts, data, and computational workflow is easily accomplished with [GitHub](https://github.com/) (a code repository) and [Zenodo](https://zenodo.org/) (a data repository). Using [the techniques described](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content), the two repositories can be easily linked and synchronized. This means that any major versions or milestones in GitHub will be automatically imported into Zenodo, where they will be given a Digital Object Identifier (DOI).
A DOI is a unique identifier that can be used to cite research artifacts. This makes it easy for others to find and reference your work.\