-
Notifications
You must be signed in to change notification settings - Fork 585
templates: support string patterns in template language #6899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8ffafc9
to
cc7eb2f
Compare
cc @yuja |
3ebcc42
to
4cae57d
Compare
just a reminder to adhere to commit guidelines here: https://github.com/jj-vcs/jj/blob/main/docs/contributing.md#commit-guidelines since. a |
3ea5b1a
to
7c68277
Compare
The full completion of this feature to be able to pull out the matching text in question probably involves introducing a new type that represents a match result (and match groups in regex), but doing so requires some rework and semantics design of the string pattern system itself. I think that would probably be |
For this use case, I think we can add
( Implementation-wise, we'll need https://docs.rs/regex/latest/regex/struct.Regex.html#method.replace |
The issue I have is that it seems nontrivial to make a to_regex translator for globs, and that IME it is most ideal to have the ability to match groups with a regex. But that's probably going to require creating a language type to be able to extract groups and matches; a regex feature that only does full matches is quite a bit harder to use. So I'm partially trying to figure out how to incrementally deliver something useful without simply . inventing good regex support in my first code PR to the project heh. I think the path forward on that is calling the function introduced in this PR |
I think we'll need to switch to the globset crate at some point. The current |
Since we already have globset in transitive dependencies, this change helps reduce the amount of dependencies. Another reason is that globset provides a function to convert glob to regex. This is nice because we use globs to match against strings or internal repository paths instead of platform-native paths. The GlobPattern wrapper is boxed because globset::Glob type is relatively big.
Unlike string patterns, backslash-escape syntax isn't forcibly enabled. Fileset globs are constructed from platform-native path inputs.
This makes it clear that it doesn't give a list of matches.
This is a basic implementation of the same string pattern system as in the revset language. It's currently only used for `string.matches`, so you can now do: ``` "foo".matches(regex:'[a-f]o+') ``` In the future this could be added to more string functions (and e.g. the ability to parse things out of strings could be added). CC: jj-vcs#6893
28917ec
to
52da308
Compare
This allows for any matcher type and allows extracting a capture group by number.
52da308
to
a6dbfd7
Compare
I've rebased and rewritten this based on #6967. It should be ready to go from my end minus one code question I couldn't figure out and need help with :) |
todo!( | ||
"how do I fix this? I can't keep the span here due to \ | ||
lifetimes. this error doesn't appear statically" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need help with how to write this diagnostic in a way that actually works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't add capture_group
argument to .match()
. I think we can instead add .replace(pattern, subst)
, which can be used as description.match(regex:'h(ell)o').replace(regex:'h(ell)o', '$1')
.
todo!( | ||
"how do I fix this? I can't keep the span here due to \ | ||
lifetimes. this error doesn't appear statically" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't add capture_group
argument to .match()
. I think we can instead add .replace(pattern, subst)
, which can be used as description.match(regex:'h(ell)o').replace(regex:'h(ell)o', '$1')
.
/// XXX: this is kind of problematic for byte regexes as would be created | ||
/// by globs. I dunno what to do about that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to change the underlying regex
type to bytes::Regex
to fix bytes handling in diff_contains()
revset. Suppose this change is accepted, I think there are two options:
- in templater, use
bytes::Regex
and convert the result back toString
. If the matched slice wasn't at unicode boundary,.match()
returns an error. - make this
to_regex()
fail if bytes regex cannot be re-interpreted as a string regex.
I think (1) is simpler.
@@ -12,6 +12,10 @@ | |||
// See the License for the specific language governing permissions and | |||
// limitations under the License. | |||
|
|||
//! Template environment for `jj log`, `jj evolog` and similar. | |||
//! | |||
//! See: <https://jj-vcs.github.io/jj/latest/templates/#commit-keywords>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would remove the doc link because the commit templater can have various context types than the Commit
type.
@@ -12,6 +12,10 @@ | |||
// See the License for the specific language governing permissions and | |||
// limitations under the License. | |||
|
|||
//! Templates for `jj op log`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "Template environment" sounds more correct.
@@ -51,6 +55,7 @@ pub trait OperationTemplateLanguageExtension { | |||
fn build_cache_extensions(&self, extensions: &mut ExtensionsMap); | |||
} | |||
|
|||
/// Templates for `jj op log`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Same here. This isn't a "Template", but "Template environment".
/// Implements conversion from any error type to support `expr?` in function | ||
/// binding. This type doesn't implement `std::error::Error` instead. | ||
/// <https://github.com/dtolnay/anyhow/issues/25#issuecomment-544140480> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is an implementation comment. Revert the change.
/// Template which, upon being forced (`extract()`ed), will evaluate its | ||
/// condition and select between a template for the true case and a template | ||
/// for the false case (which will yield nothing if it is [`None`]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds more like a textual explanation of the code. Can you simplify? I think it'll be something like "Template which selects output based on a boolean condition."
@@ -88,11 +91,18 @@ formal_parameters = { | |||
identifier ~ (whitespace* ~ "," ~ whitespace* ~ identifier)* ~ (whitespace* ~ ",")? | |||
| "" | |||
} | |||
string_pattern_identifier = @{ (ASCII_ALPHA | "_") ~ (ASCII_ALPHANUMERIC | "_" | "-")* } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment. It's nice that you noticed that -
is allowed as a pattern prefix whereas -
can also be infix/prefix operators. The grammar isn't ambiguous yet, but it's a bit odd. I don't have a good idea to address this potential problem.
// Don't write default implementations on these (and certainly not ones | ||
// using another function on self): wrapper types that contain a sum type | ||
// of a CoreTemplatePropertyKind or a custom type will hit the default case | ||
// every time rather than the implementation on CoreTemplatePropertyKind if | ||
// a custom implementation is omitted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wouldn't add this comment because I don't think it is any special here. We shouldn't add default implementation unless that makes sense.
other => { | ||
return Err(TemplateParseError::expression( | ||
format!("Unexpected literal in string pattern: {other:?}"), | ||
span, | ||
)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should be panic!()
?
// XXX(jade): this should probably make the StringPattern | ||
// itself here, but because StringPattern is not PartialEq (and | ||
// cannot be due to regexes), we defer that to later and Deal With | ||
// the AST being slightly less clean (and slightly less validated) | ||
// than is ideal. | ||
ExpressionNode::new(ExpressionKind::StringPattern { kind, value: text }, span) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current implementation is correct. We aren't doing any revset/fileset/string pattern parsing in the parser. And we might add other kind of patterns (e.g. date pattern) that share the same syntax.
impl Template for StringPattern { | ||
fn format(&self, formatter: &mut TemplateFormatter) -> io::Result<()> { | ||
write!(formatter, "{self}") | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shouldn't be needed anymore?
This is a basic implementation of the same string pattern system as in
the revset language. It's currently only used for
string.matches
, soyou can now do:
In the future this could be added to more string functions (and e.g.
the ability to parse things out of strings could be added).
CC: #6893
Checklist
If applicable:
CHANGELOG.md
README.md
,docs/
,demos/
)cli/src/config-schema.json
)