Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Automatic installation of packages detected via explicit package::function() calls in quarto and R projects #712

Open
fretwurst opened this issue Nov 8, 2024 · 1 comment

Comments

@fretwurst
Copy link

Problem

In CI/CD-Workflows, particularly for Quarto-based projects, missing package installations often cause rendering processes to fail. This is a common issue when:

  • Packages are used explicitly via package::function() in .qmd files and are not pre-installed in the CI/CD environment.
  • CI/CD pipelines (e.g., with Rocker Docker containers) halt after encountering the first missing package, leading to time-consuming debugging in large projects.

For Quarto projects, active chapters defined in _quarto.yml (under chapters or render) often determine the relevant .qmd files to render. Detecting and pre-installing the packages used in these files before rendering could significantly streamline the CI/CD workflow.


Proposed Solution

Enhance pak with a feature to:

  1. Scan project files (e.g., .qmd, .Rmd, .R) for all explicitly used packages:
    • Detect package::function() calls.
    • Optionally scan for pak::pkg_install() or pak::pkg() calls in file headers.
  2. Support Quarto project workflows:
    • Read _quarto.yml to identify active .qmd files (chapters or render keys).
    • Install all required packages before rendering begins.
  3. Install missing packages efficiently:
    • Use pak's parallelized installation and caching to minimize installation time in CI/CD pipelines.
    • Avoid breaking on the first missing package.

Best Practice Alignment

Modern R style guides, such as the Google R Style Guide and the RStudio Tidyverse Style Guide, recommend using explicit package::function() calls over loading packages globally. This approach improves:

  • Clarity: The source of each function is immediately clear.
  • Conflict avoidance: Prevents naming conflicts between functions in different packages.
  • Modularity: Ensures code runs independently of preloaded packages.

Given this trend, tools like pak should support workflows where packages are explicitly referenced, especially in CI/CD contexts where no preloaded environment exists.

#### Example Workflow
A new `pak` function, such as `pak::install_quarto_deps()`, could streamline this process:

```r
# Automatically scan a Quarto project and install dependencies
pak::install_quarto_deps(yml = "_quarto.yml")

This function would:

  • Parse _quarto.yml to identify active .qmd files.
  • Extract all packages used via package::function() in these files.
  • Install any missing packages before rendering.

Alternatively, a more general function like pak::scan_and_install() could be used for non-Quarto workflows:

# Scan an arbitrary folder for used packages and install them
pak::scan_and_install(path = ".", pattern = "\\.qmd$")

Benefits

  1. Streamlined CI/CD Pipelines:
    Avoid pipeline failures due to missing packages by ensuring all dependencies are installed in advance.

  2. Efficiency for Large Projects:
    Automatically handle dependency management for Quarto projects with multiple .qmd files and dynamic dependencies.

  3. Modern Style Alignment:
    Supports best practices by enabling workflows where package::function() is preferred over global package loading.

  4. Broader Use Case:
    While the focus is on Quarto projects, this feature could benefit RMarkdown users or anyone working with R scripts in CI/CD environments.

  5. Optimized for Docker:
    By leveraging pak’s caching and parallelized installation, it minimizes time and resources in containerized environments.

@gaborcsardi
Copy link
Member

This is already happening here: r-lib/pkgdepends#390

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants