Static diagnostics for `library()` and `require()` calls #870

lionel- · 2025-07-11T16:31:22Z

Addresses posit-dev/positron#1325
Progress towards posit-dev/positron#2321

This PR partially fixes the issue of "unknown symbols" diagnotics in fresh console sessions by moving towards static analysis of library() and require() calls:

We analyse DESCRIPTION (for the Depends: field) and NAMESPACE (for export() directives) files.
Exported symbols are put in scope at the call site of library() and require() calls.

This takes care of most unused symbol diagnostics but not all:

If the symbol is used before the library() call, it's still unkown and will still cause a diagnostic (this is expected behaviour).
exportPattern() directives are not supported yet (see Static analysis of exportPattern() directives positron#8520)
Exported data sets are not supported yet (see Static analysis of data/ exports in R packages positron#8521)

Since this mechanism is currently limited and is a new approach, we still use the current dynamic approach as a fallback. This means the same gestures Positron users currently use to silence diagnostics (such as evaluating library() calls) still work as before.

This also means we are in a weird in-between state where diagnostics are not fully static, unless the session is 100% fresh. Once the limitations of the static diagnostics have been lifted, I think we should remove the dynamic fallback. The UX consequences of removing this fallback are discussed in posit-dev/positron#2321 (comment).

Approach:

We now examine package files installed in the session's library paths.

New Description and Namespace structs with parse() methods. For DESCRIPTION we implement our own DCF parser. For NAMESPACE we use a TS query for convenience, using the TSQuery helper implemented in Emit R6Class methods as workspace symbols #861.
New Libary and Package structs with load() methods. A library is loaded from a set of library paths, and a package is loaded from a single library path.

The packages in a library are loaded lazily and cached in the library. For simplicity, packages are not invalidated when installed files change. In the future, once we have Salsa and the VFS infrastructure from Rust-Analyzer, we will be able to watch for changes and automatically cache updates in a simple and efficient way.
.libPaths() is called at the start of the session. This is a static value that doesn't change throughout the session. When the LSP is decoupled we'll call R to get the lib paths and this will be static as well. If the lib paths change, the LSP must be restarted.

Side note: I'm realising that the decoupled LSP will generally require an R binary in order to work well. This is similar to Rust-Analyzer requiring cargo to e.g. fetch metadata, so I no longer think this is a problem.
When a library() or require() call is encountered, we get the package from the library. This causes to load if not loaded yet. We get the exports from the namespace file to put them in scope at that point in the file, and the depends field from the description file to attach other needed packages.
The symbols exported by a package are stored in a BTreeMap keyed by sorted positions in the file. When we lookup whether a symbol is defined, we simply discard exports whose position is greater than the symbol. We don't need to take masking or package ordering into account as we currently only need to check for existence of the symbol, not its type.

Note that {tidyverse} and {tidymodels} don't declare packages in Depends:, instead they attach packages from .onAttach(). I've hard-coded them for now but in the longer term we need to nudge package authors towards an explicit declaration in DESCRIPTION, such as Config/Needs/attach:. I've opened an issue about this in tidyverse/tidyverse#359.

QA Notes

With:

mutate(mtcars) # Not in scope

library(dplyr)

mutate(mtcars) # In scope


ggplot() # Not in scope

# Attach handled specially
library(tidyverse)

ggplot() # In scope


plan() # Not in scope

# `future` attached via `Depends`
library(furrr)

plan() # In scope

You should see:

When you evaluate one of the library() calls, the corresponding diagnostics about unknown symbols before the library call should disappear. That would ideally not be the case, but for now we allow this as an escape hatch to work around shortcomings of the new system.

lionel- · 2025-07-15T12:26:30Z

crates/ark/src/lsp/inputs/source_root.rs

@@ -0,0 +1,16 @@
+//


I didn't end up using this here but will use it in the next PR for package imports.

lionel- · 2025-07-15T12:28:17Z

crates/ark/src/lsp/traits/node.rs

+    fn children_of(node: Self) -> impl Iterator<Item = Self>;
+    fn next_siblings(&self) -> impl Iterator<Item = Self>;
+    fn arguments(&self) -> impl Iterator<Item = (Option<Self>, Option<Self>)>;
+    fn arguments_values(&self) -> impl Iterator<Item = Self>;
+    fn arguments_names(&self) -> impl Iterator<Item = Self>;
+    fn arguments_names_as_string(&self, contents: &ropey::Rope) -> impl Iterator<Item = String>;


Bunch of new helpers for iteration over TS nodes using the cursor API. I didn't end up using all of them but they could be useful later.

lionel- · 2025-07-15T12:28:40Z

crates/ark/src/lsp/traits/node.rs

+        // We'd ideally use the cursor API here too but
+        // `ts_tree_cursor_goto_parent()` doesn't behave like
+        // `ts_node_parent()`: the latter traverses `ERROR` nodes but not the
+        // former. So for now we accept the performance hit of tree traversal at
+        // each `parent()` call.


That was a bummer.

Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.

lionel- · 2025-07-15T13:54:00Z

Another thing to consider in the future is transitive imports.

File A:

library(ggplot2)

File B:

source("file_a.R")
ggplot() # In scope

The ggplot symbol is imported via source().

For now this will have to be worked around by evaluating the source() call or the relevant library() calls.

DavisVaughan

Did a full deep code review and it seems quite nice and quite easy to understand!

I haven't done much interactive playing around with this in scripts yet, but I will and will report back if there are issues

DavisVaughan · 2025-07-17T16:42:11Z

crates/ark/src/lsp/main_loop.rs

+        // FIXME: We shouldn't call R code in the kernel to figure this out
+        if let Err(err) = crate::r_task(|| -> anyhow::Result<()> {
+            let paths: Vec<String> = harp::RFunction::new("base", ".libPaths")
+                .call()?
+                .try_into()?;
+
+            log::info!("Using library paths: {paths:#?}");
+            let paths: Vec<PathBuf> = paths.into_iter().map(PathBuf::from).collect();
+            state.world.library = Library::new(paths);
+
+            Ok(())
+        }) {
+            log::error!("Can't evaluate `libPaths()`: {err:?}");
+        };


What if you made this a KernelNotification that the kernel sends the LSP in refresh_lsp()?

Then you'd get updates to .libPaths() as well, like if a user changes it after startup, which is totally possible.

(This assumes you'd have a way to throw out and rebuild any information that utilized the libpaths when they change, but I figure we would have this)

Yea ok after reviewing this in more detail, I feel like all we'd have to do after getting the kernel notif is update the WorldState's Library with a fresh new empty one created with Library::new(paths) and we'd be good to go?

(I also really like the idea of having clear-ish boundaries about what information comes from the Kernel, rather than having the LSP try and query it itself)

DavisVaughan · 2025-07-17T16:46:18Z

crates/ark/src/lsp/inputs/package.rs

+    /// Path to the directory that contains `DESCRIPTION`. Could be an installed
+    /// package, or a package source.


Suggested change

/// Path to the directory that contains `DESCRIPTION`. Could be an installed

/// package, or a package source.

/// Path to the directory that contains `DESCRIPTION` and `NAMESPACE`.

/// Could be an installed package, or a package source.

It looks like you need both, from the load() impl

DavisVaughan · 2025-07-17T16:48:47Z

crates/ark/src/lsp/inputs/package.rs

+        // Only consider libraries that have a folder named after the
+        // requested package and that contains a description file
+        if !description_path.is_file() {
+            return Ok(None);
+        }


You seem to also require a NAMESPACE below, so bail early here too?

DavisVaughan · 2025-07-17T16:49:50Z

crates/ark/src/lsp/inputs/package.rs

+        let namespace = Namespace::parse(&namespace_contents)?;
+
+        Ok(Some(Package {
+            path: package_path.to_path_buf(),


Suggested change

path: package_path.to_path_buf(),

path: package_path,

Already a PathBuf

DavisVaughan · 2025-07-17T16:55:56Z

crates/ark/src/lsp/traits/node.rs

+        // We'd ideally use the cursor API here too but
+        // `ts_tree_cursor_goto_parent()` doesn't behave like
+        // `ts_node_parent()`: the latter traverses `ERROR` nodes but not the
+        // former. So for now we accept the performance hit of tree traversal at
+        // each `parent()` call.


Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.

DavisVaughan · 2025-07-17T18:07:21Z

crates/ark/src/lsp/inputs/library.rs

+        for lib_path in self.library_paths.iter() {
+            match Package::load(&lib_path, name) {
+                Ok(Some(pkg)) => return Ok(Some(pkg)),
+                Ok(None) => (),
+                Err(err) => lsp::log_warn!("Can't load package: {err:?}"),
+            }


In the Err case, we should probably bail with the error right?

Right now I think this says "I found the package, but failed to load it"

And IMO we should not continue on to the next library_paths element when that occurs

If you decide not to do that, note that you don't need the return value to be anyhow::Result<> here, as you never propagate an error right now

DavisVaughan · 2025-07-17T18:25:19Z

crates/ark/src/lsp/diagnostics.rs

+    let package = insert_package_exports(&package_name, attach_pos, context)?;
+
+    // Also attach packages from `Depends` field
+    let mut attach_dependencies = package.description.depends.clone();


Suggested change

let mut attach_dependencies = package.description.depends.clone();

for package_name in package.description.depends.iter() {

insert_package_exports(package_name, attach_pos, context)?;

}

I'd try and fully avoid the clone here, you just need a reference.

Down below, you'd just add a second loop of

Suggested change

let mut attach_dependencies = package.description.depends.clone();

for package_name in attach_field.iter() {

insert_package_exports(package_name, attach_pos, context)?;

}

I think that should clean things up fairly nicely in terms of clones / ownership of Package data and I think it reads a little nicer too

DavisVaughan · 2025-07-17T18:27:21Z

crates/ark/src/lsp/diagnostics.rs

+fn insert_package_exports(
+    package_name: &str,
+    attach_pos: Point,
+    context: &mut DiagnosticContext,
+) -> anyhow::Result<Arc<Package>> {


Yea I'm fairly certain we can and should just return anyhow::Result<&Package>? And that get() method really should return a reference, not an Arc.

We'd still clone out the package.namespace.exports, which is fine and seems required, but otherwise Package is immutable

DavisVaughan · 2025-07-17T18:27:56Z

crates/ark/src/lsp/diagnostics.rs

+    /// The symbols exported by packages loaded via `library()` calls in this
+    /// document. Currently global. TODO: Store individual exports in a BTreeMap
+    /// sorted by position in the source?
+    pub library_symbols: BTreeMap<Point, HashSet<String>>,


Update comment? You did this TODO?

DavisVaughan · 2025-07-17T18:39:15Z

crates/ark/src/lsp/diagnostics.rs

+        // Check all symbols exported by `library()` calls before the given position
+        for (library_position, exports) in self.library_symbols.iter() {
+            if *library_position > start_position {
+                break;
+            }
+            if exports.contains(name) {
+                return true;
+            }


lionel- added 14 commits July 10, 2025 17:09

Draft data structures

292ba8a

Inspect library paths on LSP startup

32194b4

Extract structs from Package

d2effa5

Add tree-sitter query helpers

7fa77b8

Add NAMESPACE parser

042636a

Draft Library

81dfabc

Handle library() calls in diagnostics

1f9af53

Simplify argument extraction

ced4252

Take into account position of library() calls

ad600e4

Support require() too

6fd8e7f

Parse DESCRIPTION files

fa9a13c

Extract arguments() iterator from arguments_values()

b9b757e

Bail when character.only is supplied

4ea50a3

Warn outside of handle_package_attach_call()

391138e

lionel- force-pushed the feature/static-imports branch from 3600378 to 391138e Compare July 14, 2025 08:21

This was referenced Jul 15, 2025

Static analysis of exportPattern() directives posit-dev/positron#8520

Open

Static analysis of data/ exports in R packages posit-dev/positron#8521

Open

Declaring attached packages in DESCRIPTION tidyverse/tidyverse#359

Open

lionel- commented Jul 15, 2025

View reviewed changes

Fix typos

0d485d9

lionel- mentioned this pull request Jul 15, 2025

Add symbols imported in packages to workspace #872

Open

DavisVaughan approved these changes Jul 17, 2025

View reviewed changes

		/// Path to the directory that contains `DESCRIPTION`. Could be an installed
		/// package, or a package source.

-    let mut attach_dependencies = package.description.depends.clone();
+    for package_name in package.description.depends.iter() {
+        insert_package_exports(package_name, attach_pos, context)?;
+    }

Static diagnostics for library() and require() calls #870

Are you sure you want to change the base?

Static diagnostics for library() and require() calls #870

Uh oh!

Conversation

lionel- commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QA Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lionel- commented Jul 15, 2025

Uh oh!

DavisVaughan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Static diagnostics for `library()` and `require()` calls #870

Static diagnostics for `library()` and `require()` calls #870

lionel- commented Jul 11, 2025 •

edited

Loading