Projects
RStudio Projects1
R is a coding language interpreter. RStudio is an Integrated Development Environment (IDE) used to make it easier to interact with the coding language.
The projects feature of R helps keep projects and ideas discrete and establish a reproducible workflow. It is easier to share code across installations with projects, such as using relative file paths or other coding hygiene practices.
How to…
- Look in the upper-right corner of your RStudio IDE. Or,
File > New Projects
- Choose between New, Existing, or Version Control.
- If you choose a New project, there are several options for project types. Personally, I like starting with Quarto Website. However Quarto Project2 is a more generic setup. You can also choose Quarto Blog or Quarto Book.
By clicking the upper-right hand corner of your project name, you’ll be able to easily switch between projects.
Settings in RStudio preferences
Further recommendations on reproducibility suggest setting the following global preferences.
Uncheck
- Restore most recently opened project (uncheck)
- Restore previously open source document (uncheck)
- Restore .Rdata (uncheck)
- Save workspace to .RData to NEVER
- Always save history (uncheck)
Read more about the blank slate approach to reproducibility using the RStudio IDE.
Reproducibility
A reproducible approach to computing ensures that all documents are readily available and facilitates the execution and re-execution of computations to achieve identical results.3
Two of the most basic principles in reproducibility are…
- Do everything with code
- Render documents from code
Developing computation techniques to honor reproducibility principles means, whenever possible, reduce or eliminate mouse-clicks and hard-to-document copy/paste steps. We do this because such actions are barriers for ourselves and others when it comes to re-executing code.
Using a coding language such as R, along with a report rendering scheme such as Quarto, is an ideal way to ensure the reproducibility of your analysis and publications. Reproducibility is further enhanced by leveraging literate coding (Knuth 1984), tidy data (Wickham 2014), and tidyverse principles (Wickham et al. 2019).
Building computation upon this foundations increases the chances of archiving your work for posterity. When we look back at the computational workflows of the nineteen seventies through the early two-thousands, we see the natural problems and evolution of solutions that now lead us to good reproducible technique.
Archiving dependencies with {renv}
“The renv
package helps create reproducible environments for your R projects.” {renv}4
{
renv
} is a new effort to bring project-local R dependency management to your projects. The goal is forrenv
to be a robust, stable replacement for the {packrat
} package, with fewer surprises and better default behaviors.5
Version Control and Code Repositories:
Tools for reproducibility, collaboration, documentation, and backup
Version control6 is the practice of tracking and managing changes to files and projects over time. Code repositories7 are platforms for storing, managing, and sharing code. Researchers can use these tools to improve the reproducibility, collaboration, documentation, and backup of their research.
For example, version control can be used to track changes to a computational model, allowing researchers to easily revert to previous versions of the project if necessary. Code repositories can be used to share code with collaborators, making it easier to work together on projects.
Version control and code repositories can also be used to document the development process, which can help future researchers who want to build on the work. Additionally, code repositories can be used to create backups of code and data, which can help protect against data loss.
For tips on using the {usethis} package to simplify git (plus either GitHub or GitLab) with R, please see this quick-start page on Version-control and reproducibility.
Research artifacts
Storing, sharing, and citing your work.
Archival data repositories8 are different from code repositories in that they aim to preserve research milestones as an artifact. This can include a published article, the code and data used to produce the article, and any other relevant materials.
Meanwhile, Code repositories (mentioned above), are more focused on current workflows, and they are often used to store and share code that is used in research projects.
Linking code and data repositories can be helpful for researchers who want to ensure the reproducibility and posterity of their work. By linking the two repositories, researchers can make it easy for others to find and access the code and data that they used to produce their results.
For example, Referencing and citing your reproducible artifacts, data, and computational workflow is easily accomplished with GitHub (a code repository) and Zenodo (a data repository). Using the techniques described, the two repositories can be easily linked and synchronized. This means that any major versions or milestones in GitHub will be automatically imported into Zenodo, where they will be given a Digital Object Identifier (DOI).
A DOI is a unique identifier that can be used to cite research artifacts. This makes it easy for others to find and reference your work.
References
Footnotes
https://support.posit.co/hc/en-us/articles/200526207-Using-Projects↩︎
Read more about each of the special subtypes of RStudio projects. https://quarto.org/docs/projects/quarto-projects.html↩︎
See {renv} at https://rstudio.github.io/renv. See Also a discussion virtual environments at this Quarto page, including details on venv, conda, and renv.↩︎
Introduction to renv / Kevin Ushey↩︎
An example of a version control application is git. While git has a notoriously curmudgeonly user-interface, RStudio integrates with git and improves upon the interface. This makes version control especially useful in a scholarly-publishing/computational-thinking context.↩︎
For example: GitHub and GitLab (public site | Duke University Instance) are examples of web-based code-repositories.↩︎
For example: Zenodo.org is a free and open-source platform for sharing research data and code.↩︎