-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretty-print in SHOW CREATE
#31933
Pretty-print in SHOW CREATE
#31933
Conversation
b5ec1a6
to
07d8bd6
Compare
6ebb194
to
b461943
Compare
43fadec
to
a114777
Compare
8ddde38
to
e4bcda0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Not sure about the removed SHOW CREATE statements in some of the test files, but I trust they all have good reasons.
The config
parameter added to ~all functions in mz-sql-pretty
makes me think that we should turn them into methods on a struct that holds the config. But I can understand that you don't want to deal with such a large refactor now.
> SHOW CREATE SINK compression_implicit; | ||
"materialize.public.compression_implicit" ${expected-compression-implicit-create-sql} | ||
materialize.public.compression_implicit "CREATE SINK materialize.public.compression_implicit IN CLUSTER quickstart FROM materialize.public.kafka_sink_from INTO KAFKA CONNECTION materialize.public.kafka_conn (TOPIC = 'kafka-sink') FORMAT JSON ENVELOPE DEBEZIUM;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question, but why do some of the SHOW CREATE outputs have newlines and some don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql-pretty
doesn't support every statement kind. E.g., to_doc
has a case for CreateSource
, but not for CreateSink
.
> SHOW CREATE SINK compression_implicit; | ||
"materialize.public.compression_implicit" ${expected-compression-implicit-create-sql} | ||
materialize.public.compression_implicit "CREATE SINK materialize.public.compression_implicit IN CLUSTER quickstart FROM materialize.public.kafka_sink_from INTO KAFKA CONNECTION materialize.public.kafka_conn (TOPIC = 'kafka-sink') FORMAT JSON ENVELOPE DEBEZIUM;" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the pretty-printing adds a semicolon where there previously was none. Is this a compatibility concern? I think it would be if people would somehow run SHOW CREATE in their scripts, and then expect output without a trailing semicolon. Not sure why they would do either though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'd say we should just risk this.
It's probably slightly better to have a semicolon than to not have, because if you want to paste these statements into some SQL shell, then it's slightly easier if they already have a semicolon.
Btw. mzexplore
might actually run into this problem. For example, its cluster cloning functionality might use SHOW CREATE
in this way. I'm working on mzexplore
, so I'll fix this in follow-up PRs.
src/sql-pretty/src/util.rs
Outdated
@@ -33,7 +33,7 @@ where | |||
|
|||
pub(crate) fn title_comma_separate<'a, F, T, S>(title: S, f: F, v: &'a [T]) -> RcDoc<'a, ()> | |||
where | |||
F: Fn(&'a T) -> RcDoc<'a>, | |||
F: FnMut(&'a T) -> RcDoc<'a>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what motivated these changes from Fn
to FnMut
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was due to those ..._mapper
functions somehow, but I switched to just having a closure everywhere instead of those ..._mapper
functions, so I reverted these back to Fn
.
src/sql-pretty/src/lib.rs
Outdated
pub fn to_pretty<T: AstInfo>(stmt: &Statement<T>, width: usize) -> String { | ||
format!("{};", to_doc(stmt).pretty(width)) | ||
pub fn to_pretty<T: AstInfo>(stmt: &Statement<T>, config: PrettyConfig) -> String { | ||
format!("{};", to_doc(stmt, config).pretty(config.width)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Threading a PrettyConfig
into the doc*
functions doesn't seem right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See discussion here.
Ok(mz_sql_pretty::to_pretty( | ||
&resolved, | ||
PrettyConfig { | ||
width: mz_sql_pretty::DEFAULT_WIDTH, | ||
format_mode: if redacted { | ||
FormatMode::SimpleRedacted | ||
} else { | ||
FormatMode::Simple | ||
}, | ||
}, | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a large part of this change is because we need to plumb the FormatMode
into the sql-pretty
crate?
IMO pretty printing shouldn't need to know about whether or not something is redacted. What do you think about something like:
let raw_str = if redacted {
resolved.to_ast_string_redacted()
} else {
resolved.to_ast_string_stable()
};
Ok(mz_sql_pretty::pretty_str(&redacted))
Unfortunate that we need to format the string twice, but given this is only for SHOW CREATE
I don't feel too bad about it? In a future world it feels like there is an API we could introduce for the mz-sql-pretty
crate that would allow us to wrap a statement in some context which would automatically format values as redacted if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO pretty printing shouldn't need to know about whether or not something is redacted.
This didn't occur to me during the review, but I agree!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the issue that we might not be able to parse a redacted statement back. This is because maybe the parser expects a number somewhere, but then it gets something like <redacted>
. https://github.com/MaterializeInc/database-issues/issues/8796 aims to solve this problem, but there might be a long tail of cases to solve there, so in the meantime we'd have to have some error handling after the pretty_str
call, and just print the non-pretty statement if the parsing back errors out. This is doable, but maybe it tips the balance in favor of the PR's current approach. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more consideration:
There are various aspects of AST printing that we want to control (see https://github.com/MaterializeInc/database-issues/issues/9082). If we want to keep mz-sql-pretty
oblivious to all of them, and just use Parker's trick for adding redaction on top of any of the other AST printing options, then a problem is that mz-sql-pretty
might undo some of the other formatting options when it calls AstDisplay
as its "base case" with its hardwired FormatMode
.
For example, imagine that we'd like to pretty-print in FormatMode::Stable
. The above trick can't be adopted for this: if we first print with FormatMode::Stable
, then parse back, then run mz-sql-pretty
, then the problem is that mz-sql-pretty
has FormatMode::Simple
hardwired into its own calls of AstDisplay
(which it does when it can't or doesn't want to deal with further chunking up an AST fragment), so it undoes the earlier FormatMode::Stable
and just prints in FormatMode::Simple
.
This wouldn't be a concern for this particular PR (because of 1. redaction not being undone by a hardwired FormatMode
and 2. not involving FormatMode::Stable
), but I think in the future it would be great to make all formatting options orthogonal (https://github.com/MaterializeInc/database-issues/issues/9082), for which it seems to me that we'd have to wire FormatMode
through mz-sql-pretty
, to avoid mz-sql-pretty
undoing some formatting option by using its hardwired FormatMode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proceeding with the current implementation works for me! I'll put some more thought into maybe how we could refactor this, but don't want to block on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for thinking through this Gabor!
e4bcda0
to
da4b626
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments inline.
I'll approve, but I think there could be more work done to improve the PR.
fn to_ast_string_simple(&self) -> String { | ||
self.to_ast_string(FormatMode::Simple) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think to reduce noise, it'd make sense not to rename this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it after finding myself jumping to the definition repeatedly to see what FormatMode
it uses. Now it's clear from the name.
Note that the renaming is separated into its own commit (as mentioned in the PR description). This way it doesn't really add noise when reviewing: One can look at the commits individually, and all the diff that is in the renaming commit doesn't need reviewing, just the name itself.
src/sql-pretty/src/doc.rs
Outdated
fn doc_display_pass_mapper<T: AstDisplay>( | ||
config: PrettyConfig, | ||
) -> impl for<'b> FnMut(&'b T) -> RcDoc<'b, ()> { | ||
move |v| doc_display_pass(v, config) | ||
} | ||
|
||
pub(crate) fn doc_create_source<T: AstInfo>( | ||
v: &CreateSourceStatement<T>, | ||
config: PrettyConfig, | ||
) -> RcDoc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the PR needs to introduce a bunch of complexity to work around the existing code structure. It strikes me as potentially the wrong approach: Why don't we convert the freestanding functions to functions on a type instead? That way we wouldn't need to pass the config everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'll probably do this. I don't think it will really reduce the complexity, because the self
parameter will just take the place of the current config
parameter, but should help readability a bit.
(But first, I'd like to resolve the question of whether we even need to plumb FormatMode
through the sql-pretty
crate. If not, then most of the changes to sql-pretty
can simply be reverted.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done this now in the "Refactor sql-pretty: pass around PrettyConfig as &self." commit. Passes the config in &self
.
(The commit also removed the ..._mapper
functions. I just have closures everywhere now.)
@@ -3838,7 +3838,7 @@ pub static MZ_CATALOG_BUILTINS: LazyLock<BTreeMap<&'static str, Func>> = LazyLoc | |||
"pretty_sql" => Scalar { | |||
params!(String, Int32) => BinaryFunc::PrettySql => String, oid::FUNC_PRETTY_SQL; | |||
params!(String) => Operation::unary(|_ecx, s| { | |||
let width = HirScalarExpr::literal(Datum::Int32(100), ScalarType::Int32); | |||
let width = HirScalarExpr::literal(Datum::Int32(mz_sql_pretty::DEFAULT_WIDTH.try_into().expect("must fit")), ScalarType::Int32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make the constant a 32-bit number, which should be plenty enough. Then you can do cheap up conversion and don't have to unwrap here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, unfortunately it's not so simple, because then the other use of DEFAULT_WIDTH
, which gives it to PrettyConfig::width
, would need to convert from i32 to usize. I could make PrettyConfig::width
also an i32, but all these widths being signed types would look weird.
I think the original problem is that the pretty_sql
scalar function takes a signed width. We could also change that, but changing the parameter types of an existing scalar function is probably more hassle than this is worth.
So, I'd like to just stay with these widths being unsigned types (as they should be, as it width can't be negative), and just work around the problem that pretty_sql
takes an Int32
by doing a conversion only in this one spot where I'm giving the default width to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also make it a u16
. But I'm ambivalent about whether or not it makes sense to choose a slightly "weird" type to avoid an expect
.
7945e0c
to
592ff6a
Compare
592ff6a
to
4c52447
Compare
4c52447
to
63444e2
Compare
63444e2
to
5ab09a6
Compare
5ab09a6
to
2f3f888
Compare
This makes
SHOW CREATE
andSHOW REDACTED CREATE
pretty-print the result (e.g., add line breaks). (https://github.com/MaterializeInc/database-issues/issues/9078, and slack discussion)The first commit just does a renaming, and then the next commit is the main thing. I recommend reviewing commit-by-commit, and starting the review of the main commit from
humanize_sql_for_show_create
. The diff looks somewhat big, but most of the code diff is just threadingFormatMode
through thesql-pretty
crate, which was needed to enable pretty-printing for both the normalSHOW CREATE
and forSHOW REDACTED CREATE
. Also, there was a staggering amount of manual test rewrites needed, because Testdrive doesn't have auto-rewriting. (I also did some "spring cleaning": deleted some old tests, which were mirrors of other tests but with an old Kafka syntax, as discussed here.)Note that in addition to pretty-printing, this also changes the output format of
SHOW CREATE
fromFormatMode::Stable
toFormatMode::Simple
. The main effect of this is less quoting of identifiers: stable mode quotes all identifiers, thus cluttering up the screen quite a bit, while the simple mode quotes only when it's needed. I have made some efforts recently to get the "when it's needed" logic bug-free, so hopefully the simple mode is enough.Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.