-
-
Notifications
You must be signed in to change notification settings - Fork 28
boulder/draft: Add support for license matching when running boulder new #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@jplatte Could I perhaps persuade you to give this a look? |
I could give it a simple code style review, but I imagine that's not (all) you need? Though maybe it helps later rounds of review? Anything else would be out of my comfort zone, and I'm already outside it for enough of my days at the moment 🥲 |
Code style review re. smells and idomatic solutions would be great. The code already works and looks sane logic wise, so we just need it to be "clean enough" for it to go in. =) |
@@ -50,6 +50,9 @@ tokio.workspace = true | |||
url.workspace = true | |||
mailparse.workspace = true | |||
infer.workspace = true | |||
rayon.workspace = true | |||
rapidfuzz = "0.5.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't seen this library you used for license text checking before, so I looked what the one other tool I know to compare license texts does: cargo-deny uses a library called askalono that's specifically made for detecting a license from its text. Do you think it would make sense to use this library instead of rolling our own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is adding a bunch of crossbeam dependencies for speeding up the traversal of directories in search for a license file. I wouldn't really expect there to be any serious real-world performance gains from parallelizing this, as the filesystem access should be the bottleneck here.
So I'd prefer using the walkdir
crate (also from a well-known author, same as the dependencies here). That has the extra benefit of have an easy toggle for breath-first traversal (.contents_first(true)
), which I'd expect to help performance more than the parallelization.
let mut purified_spdx_licenses = HashSet::new(); | ||
let spdx_license_paths: HashSet<_> = std::fs::read_dir(dir)? | ||
.filter_map(|entry| { | ||
entry.ok().and_then(|e| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think io error should be swallowed here (via .ok()
+ filter_map
). Declaring spdx_license_paths
as mutable beforehand and inserting into it in a for
loop (just like purified_spdx_licenses
) is probably the easiest / most readable way of allowing short-circuiting while iterating. (via let entry = entry?;
as the first statement in the loop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested per Jonas' review.
No description provided.