-
Notifications
You must be signed in to change notification settings - Fork 17
Static diagnostics for library()
and require()
calls
#870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3600378
to
391138e
Compare
@@ -0,0 +1,16 @@ | |||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't end up using this here but will use it in the next PR for package imports.
fn children_of(node: Self) -> impl Iterator<Item = Self>; | ||
fn next_siblings(&self) -> impl Iterator<Item = Self>; | ||
fn arguments(&self) -> impl Iterator<Item = (Option<Self>, Option<Self>)>; | ||
fn arguments_values(&self) -> impl Iterator<Item = Self>; | ||
fn arguments_names(&self) -> impl Iterator<Item = Self>; | ||
fn arguments_names_as_string(&self, contents: &ropey::Rope) -> impl Iterator<Item = String>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bunch of new helpers for iteration over TS nodes using the cursor API. I didn't end up using all of them but they could be useful later.
// We'd ideally use the cursor API here too but | ||
// `ts_tree_cursor_goto_parent()` doesn't behave like | ||
// `ts_node_parent()`: the latter traverses `ERROR` nodes but not the | ||
// former. So for now we accept the performance hit of tree traversal at | ||
// each `parent()` call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a bummer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.
Another thing to consider in the future is transitive imports. File A: library(ggplot2) File B: source("file_a.R")
ggplot() # In scope The For now this will have to be worked around by evaluating the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a full deep code review and it seems quite nice and quite easy to understand!
I haven't done much interactive playing around with this in scripts yet, but I will and will report back if there are issues
// FIXME: We shouldn't call R code in the kernel to figure this out | ||
if let Err(err) = crate::r_task(|| -> anyhow::Result<()> { | ||
let paths: Vec<String> = harp::RFunction::new("base", ".libPaths") | ||
.call()? | ||
.try_into()?; | ||
|
||
log::info!("Using library paths: {paths:#?}"); | ||
let paths: Vec<PathBuf> = paths.into_iter().map(PathBuf::from).collect(); | ||
state.world.library = Library::new(paths); | ||
|
||
Ok(()) | ||
}) { | ||
log::error!("Can't evaluate `libPaths()`: {err:?}"); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you made this a KernelNotification
that the kernel sends the LSP in refresh_lsp()
?
Then you'd get updates to .libPaths()
as well, like if a user changes it after startup, which is totally possible.
(This assumes you'd have a way to throw out and rebuild any information that utilized the libpaths when they change, but I figure we would have this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea ok after reviewing this in more detail, I feel like all we'd have to do after getting the kernel notif is update the WorldState
's Library
with a fresh new empty one created with Library::new(paths)
and we'd be good to go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I also really like the idea of having clear-ish boundaries about what information comes from the Kernel, rather than having the LSP try and query it itself)
/// Path to the directory that contains `DESCRIPTION`. Could be an installed | ||
/// package, or a package source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Path to the directory that contains `DESCRIPTION`. Could be an installed | |
/// package, or a package source. | |
/// Path to the directory that contains `DESCRIPTION` and `NAMESPACE`. | |
/// Could be an installed package, or a package source. |
It looks like you need both, from the load()
impl
// Only consider libraries that have a folder named after the | ||
// requested package and that contains a description file | ||
if !description_path.is_file() { | ||
return Ok(None); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to also require a NAMESPACE below, so bail early here too?
let namespace = Namespace::parse(&namespace_contents)?; | ||
|
||
Ok(Some(Package { | ||
path: package_path.to_path_buf(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path: package_path.to_path_buf(), | |
path: package_path, |
Already a PathBuf
// We'd ideally use the cursor API here too but | ||
// `ts_tree_cursor_goto_parent()` doesn't behave like | ||
// `ts_node_parent()`: the latter traverses `ERROR` nodes but not the | ||
// former. So for now we accept the performance hit of tree traversal at | ||
// each `parent()` call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this matter to you? I do not trust tree-sitter's error recovered tree enough to try and do anything useful when there are ERRORs in the tree. i.e. if there are syntax errors in the file, bail entirely for right now.
for lib_path in self.library_paths.iter() { | ||
match Package::load(&lib_path, name) { | ||
Ok(Some(pkg)) => return Ok(Some(pkg)), | ||
Ok(None) => (), | ||
Err(err) => lsp::log_warn!("Can't load package: {err:?}"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Err
case, we should probably bail with the error right?
Right now I think this says "I found the package, but failed to load it"
And IMO we should not continue on to the next library_paths
element when that occurs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you decide not to do that, note that you don't need the return value to be anyhow::Result<>
here, as you never propagate an error right now
let package = insert_package_exports(&package_name, attach_pos, context)?; | ||
|
||
// Also attach packages from `Depends` field | ||
let mut attach_dependencies = package.description.depends.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut attach_dependencies = package.description.depends.clone(); | |
for package_name in package.description.depends.iter() { | |
insert_package_exports(package_name, attach_pos, context)?; | |
} |
I'd try and fully avoid the clone here, you just need a reference.
Down below, you'd just add a second loop of
let mut attach_dependencies = package.description.depends.clone(); | |
for package_name in attach_field.iter() { | |
insert_package_exports(package_name, attach_pos, context)?; | |
} |
I think that should clean things up fairly nicely in terms of clones / ownership of Package
data and I think it reads a little nicer too
fn insert_package_exports( | ||
package_name: &str, | ||
attach_pos: Point, | ||
context: &mut DiagnosticContext, | ||
) -> anyhow::Result<Arc<Package>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I'm fairly certain we can and should just return anyhow::Result<&Package>
? And that get()
method really should return a reference, not an Arc
.
We'd still clone out the package.namespace.exports
, which is fine and seems required, but otherwise Package
is immutable
/// The symbols exported by packages loaded via `library()` calls in this | ||
/// document. Currently global. TODO: Store individual exports in a BTreeMap | ||
/// sorted by position in the source? | ||
pub library_symbols: BTreeMap<Point, HashSet<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment? You did this TODO?
// Check all symbols exported by `library()` calls before the given position | ||
for (library_position, exports) in self.library_symbols.iter() { | ||
if *library_position > start_position { | ||
break; | ||
} | ||
if exports.contains(name) { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEAT
Addresses posit-dev/positron#1325
Progress towards posit-dev/positron#2321
This PR partially fixes the issue of "unknown symbols" diagnotics in fresh console sessions by moving towards static analysis of
library()
andrequire()
calls:We analyse
DESCRIPTION
(for theDepends:
field) andNAMESPACE
(forexport()
directives) files.Exported symbols are put in scope at the call site of
library()
andrequire()
calls.This takes care of most unused symbol diagnostics but not all:
If the symbol is used before the
library()
call, it's still unkown and will still cause a diagnostic (this is expected behaviour).exportPattern()
directives are not supported yet (see Static analysis ofexportPattern()
directives positron#8520)Exported data sets are not supported yet (see Static analysis of
data/
exports in R packages positron#8521)Since this mechanism is currently limited and is a new approach, we still use the current dynamic approach as a fallback. This means the same gestures Positron users currently use to silence diagnostics (such as evaluating
library()
calls) still work as before.This also means we are in a weird in-between state where diagnostics are not fully static, unless the session is 100% fresh. Once the limitations of the static diagnostics have been lifted, I think we should remove the dynamic fallback. The UX consequences of removing this fallback are discussed in posit-dev/positron#2321 (comment).
Approach:
We now examine package files installed in the session's library paths.
New
Description
andNamespace
structs withparse()
methods. For DESCRIPTION we implement our own DCF parser. For NAMESPACE we use a TS query for convenience, using theTSQuery
helper implemented in Emit R6Class methods as workspace symbols #861.New
Libary
andPackage
structs withload()
methods. A library is loaded from a set of library paths, and a package is loaded from a single library path.The packages in a library are loaded lazily and cached in the library. For simplicity, packages are not invalidated when installed files change. In the future, once we have Salsa and the VFS infrastructure from Rust-Analyzer, we will be able to watch for changes and automatically cache updates in a simple and efficient way.
.libPaths()
is called at the start of the session. This is a static value that doesn't change throughout the session. When the LSP is decoupled we'll callR
to get the lib paths and this will be static as well. If the lib paths change, the LSP must be restarted.Side note: I'm realising that the decoupled LSP will generally require an
R
binary in order to work well. This is similar to Rust-Analyzer requiringcargo
to e.g. fetch metadata, so I no longer think this is a problem.When a
library()
orrequire()
call is encountered, we get the package from the library. This causes to load if not loaded yet. We get the exports from the namespace file to put them in scope at that point in the file, and the depends field from the description file to attach other needed packages.The symbols exported by a package are stored in a
BTreeMap
keyed by sorted positions in the file. When we lookup whether a symbol is defined, we simply discard exports whose position is greater than the symbol. We don't need to take masking or package ordering into account as we currently only need to check for existence of the symbol, not its type.Note that {tidyverse} and {tidymodels} don't declare packages in
Depends:
, instead they attach packages from.onAttach()
. I've hard-coded them for now but in the longer term we need to nudge package authors towards an explicit declaration in DESCRIPTION, such asConfig/Needs/attach:
. I've opened an issue about this in tidyverse/tidyverse#359.QA Notes
With:
You should see:
When you evaluate one of the
library()
calls, the corresponding diagnostics about unknown symbols before the library call should disappear. That would ideally not be the case, but for now we allow this as an escape hatch to work around shortcomings of the new system.