Skip to content

boulder/draft: Add support for license matching when running boulder new #472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

joebonrichie
Copy link
Contributor

No description provided.

@ermo
Copy link
Member

ermo commented May 8, 2025

@jplatte Could I perhaps persuade you to give this a look?

@ermo ermo added this to the alpha1 milestone May 8, 2025
@jplatte
Copy link
Member

jplatte commented May 8, 2025

I could give it a simple code style review, but I imagine that's not (all) you need? Though maybe it helps later rounds of review?

Anything else would be out of my comfort zone, and I'm already outside it for enough of my days at the moment 🥲

@ermo
Copy link
Member

ermo commented May 8, 2025

I could give it a simple code style review, but I imagine that's not (all) you need? Though maybe it helps later rounds of review?

Anything else would be out of my comfort zone, and I'm already outside it for enough of my days at the moment 🥲

Code style review re. smells and idomatic solutions would be great.

The code already works and looks sane logic wise, so we just need it to be "clean enough" for it to go in. =)

@@ -50,6 +50,9 @@ tokio.workspace = true
url.workspace = true
mailparse.workspace = true
infer.workspace = true
rayon.workspace = true
rapidfuzz = "0.5.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't seen this library you used for license text checking before, so I looked what the one other tool I know to compare license texts does: cargo-deny uses a library called askalono that's specifically made for detecting a license from its text. Do you think it would make sense to use this library instead of rolling our own?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding a bunch of crossbeam dependencies for speeding up the traversal of directories in search for a license file. I wouldn't really expect there to be any serious real-world performance gains from parallelizing this, as the filesystem access should be the bottleneck here.

So I'd prefer using the walkdir crate (also from a well-known author, same as the dependencies here). That has the extra benefit of have an easy toggle for breath-first traversal (.contents_first(true)), which I'd expect to help performance more than the parallelization.

let mut purified_spdx_licenses = HashSet::new();
let spdx_license_paths: HashSet<_> = std::fs::read_dir(dir)?
.filter_map(|entry| {
entry.ok().and_then(|e| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think io error should be swallowed here (via .ok() + filter_map). Declaring spdx_license_paths as mutable beforehand and inserting into it in a for loop (just like purified_spdx_licenses) is probably the easiest / most readable way of allowing short-circuiting while iterating. (via let entry = entry?; as the first statement in the loop)

@ermo
Copy link
Member

ermo commented May 10, 2025

@joebonrichie 🏓

Copy link
Member

@ermo ermo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested per Jonas' review.

@ermo ermo modified the milestones: alpha1, alpha2 May 14, 2025
@ermo ermo added priority: could have type: user experience Will make for a nicer user experience labels May 27, 2025
@ermo ermo marked this pull request as draft June 11, 2025 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: could have type: user experience Will make for a nicer user experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants