Skip to content

templates: support string patterns in template language #6899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

lf-
Copy link
Contributor

@lf- lf- commented Jul 5, 2025

This is a basic implementation of the same string pattern system as in
the revset language. It's currently only used for string.matches, so
you can now do:

"foo".matches(regex:'[a-f]o+')

In the future this could be added to more string functions (and e.g.
the ability to parse things out of strings could be added).

CC: #6893

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added/updated tests to cover my changes

@lf- lf- requested a review from a team as a code owner July 5, 2025 02:02
@lf- lf- force-pushed the jade/push-nlllwnwltntw branch 4 times, most recently from 8ffafc9 to cc7eb2f Compare July 5, 2025 02:20
@lf-
Copy link
Contributor Author

lf- commented Jul 5, 2025

cc @yuja

@lf- lf- force-pushed the jade/push-nlllwnwltntw branch 2 times, most recently from 3ebcc42 to 4cae57d Compare July 5, 2025 04:36
@PhilipMetzger
Copy link
Contributor

just a reminder to adhere to commit guidelines here: https://github.com/jj-vcs/jj/blob/main/docs/contributing.md#commit-guidelines since. a feat: isn't really a topic. My suggestion would be templater:.

@lf- lf- force-pushed the jade/push-nlllwnwltntw branch 3 times, most recently from 3ea5b1a to 7c68277 Compare July 5, 2025 23:55
@lf-
Copy link
Contributor Author

lf- commented Jul 6, 2025

The full completion of this feature to be able to pull out the matching text in question probably involves introducing a new type that represents a match result (and match groups in regex), but doing so requires some rework and semantics design of the string pattern system itself.

I think that would probably be .match. Which probably means this boolean one should be called is_match actually as matches might mean "what is the list of matches in this string". Either that or this one should be called match and then we extend the output type to be a more substantive type later and just assume that everyone's using it as truthiness now. Thoughts?

@lf- lf- changed the title feat: support string patterns in template language templates: support string patterns in template language Jul 6, 2025
@yuja
Copy link
Contributor

yuja commented Jul 6, 2025

The full completion of this feature to be able to pull out the matching text in question probably involves introducing a new type that represents a match result (and match groups in regex), but doing so requires some rework and semantics design of the string pattern system itself.

For this use case, I think we can add .match/es and .replace. The user might have to write the same regex twice, but that seems okay?

# .match(pat) -> String and/or .matches(path) -> List<String>
# .replace_all(pat, subst) -> String
.match(regex:'#\d+\b").replace_all(regex:'^#(.*)', 'issue-$2')

(match/matches might be a bad name, but I'm not sure.)

Implementation-wise, we'll need StringPattern::to_regex() or something, and use string-based substitution provided by Regex. We could add lambda syntax for subst expression, but that would be more complex, and I don't know if we'll need expressivity provided by arbitrary templates.

https://docs.rs/regex/latest/regex/struct.Regex.html#method.replace

@lf-
Copy link
Contributor Author

lf- commented Jul 6, 2025

The issue I have is that it seems nontrivial to make a to_regex translator for globs, and that IME it is most ideal to have the ability to match groups with a regex. But that's probably going to require creating a language type to be able to extract groups and matches; a regex feature that only does full matches is quite a bit harder to use.

So I'm partially trying to figure out how to incrementally deliver something useful without simply . inventing good regex support in my first code PR to the project heh. I think the path forward on that is calling the function introduced in this PR is_match.

@yuja
Copy link
Contributor

yuja commented Jul 6, 2025

The issue I have is that it seems nontrivial to make a to_regex translator for globs,

I think we'll need to switch to the globset crate at some point. The current StringPattern API doesn't have a separate .to_matcher() step, so adapting to GlobSet might not be straightforward. Still I think we can leverage globset to implement .to_regex().

https://docs.rs/globset/0.4.16/globset/struct.Glob.html

yuja added 3 commits July 12, 2025 15:39
Since we already have globset in transitive dependencies, this change helps
reduce the amount of dependencies. Another reason is that globset provides a
function to convert glob to regex. This is nice because we use globs to match
against strings or internal repository paths instead of platform-native paths.

The GlobPattern wrapper is boxed because globset::Glob type is relatively big.
Unlike string patterns, backslash-escape syntax isn't forcibly enabled. Fileset
globs are constructed from platform-native path inputs.
lf- added 3 commits July 17, 2025 15:30
This makes it clear that it doesn't give a list of matches.
This is a basic implementation of the same string pattern system as in
the revset language. It's currently only used for `string.matches`, so
you can now do:

```
"foo".matches(regex:'[a-f]o+')
```

In the future this could be added to more string functions (and e.g.
the ability to parse things out of strings could be added).

CC: jj-vcs#6893
@lf- lf- force-pushed the jade/push-nlllwnwltntw branch 2 times, most recently from 28917ec to 52da308 Compare July 18, 2025 00:10
This allows for any matcher type and allows extracting a capture group
by number.
@lf- lf- force-pushed the jade/push-nlllwnwltntw branch from 52da308 to a6dbfd7 Compare July 18, 2025 00:10
@lf-
Copy link
Contributor Author

lf- commented Jul 18, 2025

I've rebased and rewritten this based on #6967. It should be ready to go from my end minus one code question I couldn't figure out and need help with :)

Comment on lines +943 to +946
todo!(
"how do I fix this? I can't keep the span here due to \
lifetimes. this error doesn't appear statically"
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need help with how to write this diagnostic in a way that actually works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add capture_group argument to .match(). I think we can instead add .replace(pattern, subst), which can be used as description.match(regex:'h(ell)o').replace(regex:'h(ell)o', '$1').

Comment on lines +943 to +946
todo!(
"how do I fix this? I can't keep the span here due to \
lifetimes. this error doesn't appear statically"
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add capture_group argument to .match(). I think we can instead add .replace(pattern, subst), which can be used as description.match(regex:'h(ell)o').replace(regex:'h(ell)o', '$1').

Comment on lines +303 to +304
/// XXX: this is kind of problematic for byte regexes as would be created
/// by globs. I dunno what to do about that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to change the underlying regex type to bytes::Regex to fix bytes handling in diff_contains() revset. Suppose this change is accepted, I think there are two options:

  1. in templater, use bytes::Regex and convert the result back to String. If the matched slice wasn't at unicode boundary, .match() returns an error.
  2. make this to_regex() fail if bytes regex cannot be re-interpreted as a string regex.

I think (1) is simpler.

@@ -12,6 +12,10 @@
// See the License for the specific language governing permissions and
// limitations under the License.

//! Template environment for `jj log`, `jj evolog` and similar.
//!
//! See: <https://jj-vcs.github.io/jj/latest/templates/#commit-keywords>.
Copy link
Contributor

@yuja yuja Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would remove the doc link because the commit templater can have various context types than the Commit type.

@@ -12,6 +12,10 @@
// See the License for the specific language governing permissions and
// limitations under the License.

//! Templates for `jj op log`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Template environment" sounds more correct.

@@ -51,6 +55,7 @@ pub trait OperationTemplateLanguageExtension {
fn build_cache_extensions(&self, extensions: &mut ExtensionsMap);
}

/// Templates for `jj op log`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Same here. This isn't a "Template", but "Template environment".

Comment on lines +323 to +325
/// Implements conversion from any error type to support `expr?` in function
/// binding. This type doesn't implement `std::error::Error` instead.
/// <https://github.com/dtolnay/anyhow/issues/25#issuecomment-544140480>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This is an implementation comment. Revert the change.

Comment on lines +606 to +608
/// Template which, upon being forced (`extract()`ed), will evaluate its
/// condition and select between a template for the true case and a template
/// for the false case (which will yield nothing if it is [`None`]).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds more like a textual explanation of the code. Can you simplify? I think it'll be something like "Template which selects output based on a boolean condition."

@@ -88,11 +91,18 @@ formal_parameters = {
identifier ~ (whitespace* ~ "," ~ whitespace* ~ identifier)* ~ (whitespace* ~ ",")?
| ""
}
string_pattern_identifier = @{ (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_" | "-")* }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment. It's nice that you noticed that - is allowed as a pattern prefix whereas - can also be infix/prefix operators. The grammar isn't ambiguous yet, but it's a bit odd. I don't have a good idea to address this potential problem.

Comment on lines +170 to +174
// Don't write default implementations on these (and certainly not ones
// using another function on self): wrapper types that contain a sum type
// of a CoreTemplatePropertyKind or a custom type will hit the default case
// every time rather than the implementation on CoreTemplatePropertyKind if
// a custom implementation is omitted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wouldn't add this comment because I don't think it is any special here. We shouldn't add default implementation unless that makes sense.

Comment on lines +508 to +513
other => {
return Err(TemplateParseError::expression(
format!("Unexpected literal in string pattern: {other:?}"),
span,
))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be panic!()?

Comment on lines +515 to +520
// XXX(jade): this should probably make the StringPattern
// itself here, but because StringPattern is not PartialEq (and
// cannot be due to regexes), we defer that to later and Deal With
// the AST being slightly less clean (and slightly less validated)
// than is ideal.
ExpressionNode::new(ExpressionKind::StringPattern { kind, value: text }, span)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation is correct. We aren't doing any revset/fileset/string pattern parsing in the parser. And we might add other kind of patterns (e.g. date pattern) that share the same syntax.

Comment on lines +95 to +100
impl Template for StringPattern {
fn format(&self, formatter: &mut TemplateFormatter) -> io::Result<()> {
write!(formatter, "{self}")
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shouldn't be needed anymore?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants