-
Notifications
You must be signed in to change notification settings - Fork 487
Pretty-print in SHOW CREATE
#31933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretty-print in SHOW CREATE
#31933
Conversation
b5ec1a6 to
07d8bd6
Compare
6ebb194 to
b461943
Compare
43fadec to
a114777
Compare
8ddde38 to
e4bcda0
Compare
teskje
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Not sure about the removed SHOW CREATE statements in some of the test files, but I trust they all have good reasons.
The config parameter added to ~all functions in mz-sql-pretty makes me think that we should turn them into methods on a struct that holds the config. But I can understand that you don't want to deal with such a large refactor now.
|
|
||
| > SHOW CREATE SINK compression_implicit; | ||
| "materialize.public.compression_implicit" ${expected-compression-implicit-create-sql} | ||
| materialize.public.compression_implicit "CREATE SINK materialize.public.compression_implicit IN CLUSTER quickstart FROM materialize.public.kafka_sink_from INTO KAFKA CONNECTION materialize.public.kafka_conn (TOPIC = 'kafka-sink') FORMAT JSON ENVELOPE DEBEZIUM;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question, but why do some of the SHOW CREATE outputs have newlines and some don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql-pretty doesn't support every statement kind. E.g., to_doc has a case for CreateSource, but not for CreateSink.
|
|
||
| > SHOW CREATE SINK compression_implicit; | ||
| "materialize.public.compression_implicit" ${expected-compression-implicit-create-sql} | ||
| materialize.public.compression_implicit "CREATE SINK materialize.public.compression_implicit IN CLUSTER quickstart FROM materialize.public.kafka_sink_from INTO KAFKA CONNECTION materialize.public.kafka_conn (TOPIC = 'kafka-sink') FORMAT JSON ENVELOPE DEBEZIUM;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the pretty-printing adds a semicolon where there previously was none. Is this a compatibility concern? I think it would be if people would somehow run SHOW CREATE in their scripts, and then expect output without a trailing semicolon. Not sure why they would do either though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'd say we should just risk this.
It's probably slightly better to have a semicolon than to not have, because if you want to paste these statements into some SQL shell, then it's slightly easier if they already have a semicolon.
Btw. mzexplore might actually run into this problem. For example, its cluster cloning functionality might use SHOW CREATE in this way. I'm working on mzexplore, so I'll fix this in follow-up PRs.
src/sql-pretty/src/util.rs
Outdated
| pub(crate) fn title_comma_separate<'a, F, T, S>(title: S, f: F, v: &'a [T]) -> RcDoc<'a, ()> | ||
| where | ||
| F: Fn(&'a T) -> RcDoc<'a>, | ||
| F: FnMut(&'a T) -> RcDoc<'a>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what motivated these changes from Fn to FnMut?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was due to those ..._mapper functions somehow, but I switched to just having a closure everywhere instead of those ..._mapper functions, so I reverted these back to Fn.
src/sql-pretty/src/lib.rs
Outdated
| pub fn to_pretty<T: AstInfo>(stmt: &Statement<T>, width: usize) -> String { | ||
| format!("{};", to_doc(stmt).pretty(width)) | ||
| pub fn to_pretty<T: AstInfo>(stmt: &Statement<T>, config: PrettyConfig) -> String { | ||
| format!("{};", to_doc(stmt, config).pretty(config.width)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Threading a PrettyConfig into the doc* functions doesn't seem right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See discussion here.
| Ok(mz_sql_pretty::to_pretty( | ||
| &resolved, | ||
| PrettyConfig { | ||
| width: mz_sql_pretty::DEFAULT_WIDTH, | ||
| format_mode: if redacted { | ||
| FormatMode::SimpleRedacted | ||
| } else { | ||
| FormatMode::Simple | ||
| }, | ||
| }, | ||
| )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a large part of this change is because we need to plumb the FormatMode into the sql-pretty crate?
IMO pretty printing shouldn't need to know about whether or not something is redacted. What do you think about something like:
let raw_str = if redacted {
resolved.to_ast_string_redacted()
} else {
resolved.to_ast_string_stable()
};
Ok(mz_sql_pretty::pretty_str(&redacted))
Unfortunate that we need to format the string twice, but given this is only for SHOW CREATE I don't feel too bad about it? In a future world it feels like there is an API we could introduce for the mz-sql-pretty crate that would allow us to wrap a statement in some context which would automatically format values as redacted if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO pretty printing shouldn't need to know about whether or not something is redacted.
This didn't occur to me during the review, but I agree!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the issue that we might not be able to parse a redacted statement back. This is because maybe the parser expects a number somewhere, but then it gets something like <redacted>. https://github.com/MaterializeInc/database-issues/issues/8796 aims to solve this problem, but there might be a long tail of cases to solve there, so in the meantime we'd have to have some error handling after the pretty_str call, and just print the non-pretty statement if the parsing back errors out. This is doable, but maybe it tips the balance in favor of the PR's current approach. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more consideration:
There are various aspects of AST printing that we want to control (see https://github.com/MaterializeInc/database-issues/issues/9082). If we want to keep mz-sql-pretty oblivious to all of them, and just use Parker's trick for adding redaction on top of any of the other AST printing options, then a problem is that mz-sql-pretty might undo some of the other formatting options when it calls AstDisplay as its "base case" with its hardwired FormatMode.
For example, imagine that we'd like to pretty-print in FormatMode::Stable. The above trick can't be adopted for this: if we first print with FormatMode::Stable, then parse back, then run mz-sql-pretty, then the problem is that mz-sql-pretty has FormatMode::Simple hardwired into its own calls of AstDisplay (which it does when it can't or doesn't want to deal with further chunking up an AST fragment), so it undoes the earlier FormatMode::Stable and just prints in FormatMode::Simple.
This wouldn't be a concern for this particular PR (because of 1. redaction not being undone by a hardwired FormatMode and 2. not involving FormatMode::Stable), but I think in the future it would be great to make all formatting options orthogonal (https://github.com/MaterializeInc/database-issues/issues/9082), for which it seems to me that we'd have to wire FormatMode through mz-sql-pretty, to avoid mz-sql-pretty undoing some formatting option by using its hardwired FormatMode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proceeding with the current implementation works for me! I'll put some more thought into maybe how we could refactor this, but don't want to block on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking through this Gabor!
e4bcda0 to
da4b626
Compare
antiguru
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments inline.
I'll approve, but I think there could be more work done to improve the PR.
| fn to_ast_string_simple(&self) -> String { | ||
| self.to_ast_string(FormatMode::Simple) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to reduce noise, it'd make sense not to rename this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it after finding myself jumping to the definition repeatedly to see what FormatMode it uses. Now it's clear from the name.
Note that the renaming is separated into its own commit (as mentioned in the PR description). This way it doesn't really add noise when reviewing: One can look at the commits individually, and all the diff that is in the renaming commit doesn't need reviewing, just the name itself.
src/sql-pretty/src/doc.rs
Outdated
| fn doc_display_pass_mapper<T: AstDisplay>( | ||
| config: PrettyConfig, | ||
| ) -> impl for<'b> FnMut(&'b T) -> RcDoc<'b, ()> { | ||
| move |v| doc_display_pass(v, config) | ||
| } | ||
|
|
||
| pub(crate) fn doc_create_source<T: AstInfo>( | ||
| v: &CreateSourceStatement<T>, | ||
| config: PrettyConfig, | ||
| ) -> RcDoc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the PR needs to introduce a bunch of complexity to work around the existing code structure. It strikes me as potentially the wrong approach: Why don't we convert the freestanding functions to functions on a type instead? That way we wouldn't need to pass the config everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll probably do this. I don't think it will really reduce the complexity, because the self parameter will just take the place of the current config parameter, but should help readability a bit.
(But first, I'd like to resolve the question of whether we even need to plumb FormatMode through the sql-pretty crate. If not, then most of the changes to sql-pretty can simply be reverted.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done this now in the "Refactor sql-pretty: pass around PrettyConfig as &self." commit. Passes the config in &self.
(The commit also removed the ..._mapper functions. I just have closures everywhere now.)
| params!(String, Int32) => BinaryFunc::PrettySql => String, oid::FUNC_PRETTY_SQL; | ||
| params!(String) => Operation::unary(|_ecx, s| { | ||
| let width = HirScalarExpr::literal(Datum::Int32(100), ScalarType::Int32); | ||
| let width = HirScalarExpr::literal(Datum::Int32(mz_sql_pretty::DEFAULT_WIDTH.try_into().expect("must fit")), ScalarType::Int32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make the constant a 32-bit number, which should be plenty enough. Then you can do cheap up conversion and don't have to unwrap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, unfortunately it's not so simple, because then the other use of DEFAULT_WIDTH, which gives it to PrettyConfig::width, would need to convert from i32 to usize. I could make PrettyConfig::width also an i32, but all these widths being signed types would look weird.
I think the original problem is that the pretty_sql scalar function takes a signed width. We could also change that, but changing the parameter types of an existing scalar function is probably more hassle than this is worth.
So, I'd like to just stay with these widths being unsigned types (as they should be, as it width can't be negative), and just work around the problem that pretty_sql takes an Int32 by doing a conversion only in this one spot where I'm giving the default width to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also make it a u16. But I'm ambivalent about whether or not it makes sense to choose a slightly "weird" type to avoid an expect.
7945e0c to
592ff6a
Compare
592ff6a to
4c52447
Compare
4c52447 to
63444e2
Compare
63444e2 to
5ab09a6
Compare
5ab09a6 to
2f3f888
Compare
|
@ggevay This is very messy to backport, is it really required? |
|
(We can live without it, as discussed on Slack.) |
This makes
SHOW CREATEandSHOW REDACTED CREATEpretty-print the result (e.g., add line breaks). (https://github.com/MaterializeInc/database-issues/issues/9078, and slack discussion)The first commit just does a renaming, and then the next commit is the main thing. I recommend reviewing commit-by-commit, and starting the review of the main commit from
humanize_sql_for_show_create. The diff looks somewhat big, but most of the code diff is just threadingFormatModethrough thesql-prettycrate, which was needed to enable pretty-printing for both the normalSHOW CREATEand forSHOW REDACTED CREATE. Also, there was a staggering amount of manual test rewrites needed, because Testdrive doesn't have auto-rewriting. (I also did some "spring cleaning": deleted some old tests, which were mirrors of other tests but with an old Kafka syntax, as discussed here.)Note that in addition to pretty-printing, this also changes the output format of
SHOW CREATEfromFormatMode::StabletoFormatMode::Simple. The main effect of this is less quoting of identifiers: stable mode quotes all identifiers, thus cluttering up the screen quite a bit, while the simple mode quotes only when it's needed. I have made some efforts recently to get the "when it's needed" logic bug-free, so hopefully the simple mode is enough.Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.