Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty-print in SHOW CREATE #31933

Merged
merged 4 commits into from
Apr 1, 2025
Merged

Conversation

ggevay
Copy link
Contributor

@ggevay ggevay commented Mar 18, 2025

This makes SHOW CREATE and SHOW REDACTED CREATE pretty-print the result (e.g., add line breaks). (https://github.com/MaterializeInc/database-issues/issues/9078, and slack discussion)

The first commit just does a renaming, and then the next commit is the main thing. I recommend reviewing commit-by-commit, and starting the review of the main commit from humanize_sql_for_show_create. The diff looks somewhat big, but most of the code diff is just threading FormatMode through the sql-pretty crate, which was needed to enable pretty-printing for both the normal SHOW CREATE and for SHOW REDACTED CREATE. Also, there was a staggering amount of manual test rewrites needed, because Testdrive doesn't have auto-rewriting. (I also did some "spring cleaning": deleted some old tests, which were mirrors of other tests but with an old Kafka syntax, as discussed here.)

Note that in addition to pretty-printing, this also changes the output format of SHOW CREATE from FormatMode::Stable to FormatMode::Simple. The main effect of this is less quoting of identifiers: stable mode quotes all identifiers, thus cluttering up the screen quite a bit, while the simple mode quotes only when it's needed. I have made some efforts recently to get the "when it's needed" logic bug-free, so hopefully the simple mode is enough.

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ggevay ggevay added A-ADAPTER Topics related to the ADAPTER layer lts-backport-v25.1 Needs to be backported into the v25.1 LTS release labels Mar 18, 2025
@ggevay ggevay force-pushed the show-create-pretty branch 15 times, most recently from b5ec1a6 to 07d8bd6 Compare March 25, 2025 12:53
@ggevay ggevay force-pushed the show-create-pretty branch 5 times, most recently from 6ebb194 to b461943 Compare March 27, 2025 13:40
@ggevay ggevay marked this pull request as ready for review March 27, 2025 13:41
@ggevay ggevay requested review from a team as code owners March 27, 2025 13:41
@ggevay ggevay requested a review from aljoscha March 27, 2025 13:41
@ggevay ggevay force-pushed the show-create-pretty branch 2 times, most recently from 43fadec to a114777 Compare March 27, 2025 14:16
@ggevay ggevay force-pushed the show-create-pretty branch 3 times, most recently from 8ddde38 to e4bcda0 Compare March 27, 2025 17:32
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Not sure about the removed SHOW CREATE statements in some of the test files, but I trust they all have good reasons.

The config parameter added to ~all functions in mz-sql-pretty makes me think that we should turn them into methods on a struct that holds the config. But I can understand that you don't want to deal with such a large refactor now.

> SHOW CREATE SINK compression_implicit;
"materialize.public.compression_implicit" ${expected-compression-implicit-create-sql}
materialize.public.compression_implicit "CREATE SINK materialize.public.compression_implicit IN CLUSTER quickstart FROM materialize.public.kafka_sink_from INTO KAFKA CONNECTION materialize.public.kafka_conn (TOPIC = 'kafka-sink') FORMAT JSON ENVELOPE DEBEZIUM;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question, but why do some of the SHOW CREATE outputs have newlines and some don't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sql-pretty doesn't support every statement kind. E.g., to_doc has a case for CreateSource, but not for CreateSink.

> SHOW CREATE SINK compression_implicit;
"materialize.public.compression_implicit" ${expected-compression-implicit-create-sql}
materialize.public.compression_implicit "CREATE SINK materialize.public.compression_implicit IN CLUSTER quickstart FROM materialize.public.kafka_sink_from INTO KAFKA CONNECTION materialize.public.kafka_conn (TOPIC = 'kafka-sink') FORMAT JSON ENVELOPE DEBEZIUM;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the pretty-printing adds a semicolon where there previously was none. Is this a compatibility concern? I think it would be if people would somehow run SHOW CREATE in their scripts, and then expect output without a trailing semicolon. Not sure why they would do either though.

Copy link
Contributor Author

@ggevay ggevay Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'd say we should just risk this.

It's probably slightly better to have a semicolon than to not have, because if you want to paste these statements into some SQL shell, then it's slightly easier if they already have a semicolon.

Btw. mzexplore might actually run into this problem. For example, its cluster cloning functionality might use SHOW CREATE in this way. I'm working on mzexplore, so I'll fix this in follow-up PRs.

@@ -33,7 +33,7 @@ where

pub(crate) fn title_comma_separate<'a, F, T, S>(title: S, f: F, v: &'a [T]) -> RcDoc<'a, ()>
where
F: Fn(&'a T) -> RcDoc<'a>,
F: FnMut(&'a T) -> RcDoc<'a>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what motivated these changes from Fn to FnMut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was due to those ..._mapper functions somehow, but I switched to just having a closure everywhere instead of those ..._mapper functions, so I reverted these back to Fn.

pub fn to_pretty<T: AstInfo>(stmt: &Statement<T>, width: usize) -> String {
format!("{};", to_doc(stmt).pretty(width))
pub fn to_pretty<T: AstInfo>(stmt: &Statement<T>, config: PrettyConfig) -> String {
format!("{};", to_doc(stmt, config).pretty(config.width))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threading a PrettyConfig into the doc* functions doesn't seem right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See discussion here.

Comment on lines +1186 to +1196
Ok(mz_sql_pretty::to_pretty(
&resolved,
PrettyConfig {
width: mz_sql_pretty::DEFAULT_WIDTH,
format_mode: if redacted {
FormatMode::SimpleRedacted
} else {
FormatMode::Simple
},
},
))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a large part of this change is because we need to plumb the FormatMode into the sql-pretty crate?

IMO pretty printing shouldn't need to know about whether or not something is redacted. What do you think about something like:

let raw_str = if redacted {
  resolved.to_ast_string_redacted()
} else {
  resolved.to_ast_string_stable()
};

Ok(mz_sql_pretty::pretty_str(&redacted))

Unfortunate that we need to format the string twice, but given this is only for SHOW CREATE I don't feel too bad about it? In a future world it feels like there is an API we could introduce for the mz-sql-pretty crate that would allow us to wrap a statement in some context which would automatically format values as redacted if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO pretty printing shouldn't need to know about whether or not something is redacted.

This didn't occur to me during the review, but I agree!

Copy link
Contributor Author

@ggevay ggevay Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the issue that we might not be able to parse a redacted statement back. This is because maybe the parser expects a number somewhere, but then it gets something like <redacted>. https://github.com/MaterializeInc/database-issues/issues/8796 aims to solve this problem, but there might be a long tail of cases to solve there, so in the meantime we'd have to have some error handling after the pretty_str call, and just print the non-pretty statement if the parsing back errors out. This is doable, but maybe it tips the balance in favor of the PR's current approach. What do you think?

Copy link
Contributor Author

@ggevay ggevay Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more consideration:

There are various aspects of AST printing that we want to control (see https://github.com/MaterializeInc/database-issues/issues/9082). If we want to keep mz-sql-pretty oblivious to all of them, and just use Parker's trick for adding redaction on top of any of the other AST printing options, then a problem is that mz-sql-pretty might undo some of the other formatting options when it calls AstDisplay as its "base case" with its hardwired FormatMode.

For example, imagine that we'd like to pretty-print in FormatMode::Stable. The above trick can't be adopted for this: if we first print with FormatMode::Stable, then parse back, then run mz-sql-pretty, then the problem is that mz-sql-pretty has FormatMode::Simple hardwired into its own calls of AstDisplay (which it does when it can't or doesn't want to deal with further chunking up an AST fragment), so it undoes the earlier FormatMode::Stable and just prints in FormatMode::Simple.

This wouldn't be a concern for this particular PR (because of 1. redaction not being undone by a hardwired FormatMode and 2. not involving FormatMode::Stable), but I think in the future it would be great to make all formatting options orthogonal (https://github.com/MaterializeInc/database-issues/issues/9082), for which it seems to me that we'd have to wire FormatMode through mz-sql-pretty, to avoid mz-sql-pretty undoing some formatting option by using its hardwired FormatMode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proceeding with the current implementation works for me! I'll put some more thought into maybe how we could refactor this, but don't want to block on it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking through this Gabor!

@ggevay ggevay force-pushed the show-create-pretty branch from e4bcda0 to da4b626 Compare March 27, 2025 18:32
Copy link
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments inline.

I'll approve, but I think there could be more work done to improve the PR.

Comment on lines +157 to +158
fn to_ast_string_simple(&self) -> String {
self.to_ast_string(FormatMode::Simple)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to reduce noise, it'd make sense not to rename this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it after finding myself jumping to the definition repeatedly to see what FormatMode it uses. Now it's clear from the name.

Note that the renaming is separated into its own commit (as mentioned in the PR description). This way it doesn't really add noise when reviewing: One can look at the commits individually, and all the diff that is in the renaming commit doesn't need reviewing, just the name itself.

Comment on lines 43 to 52
fn doc_display_pass_mapper<T: AstDisplay>(
config: PrettyConfig,
) -> impl for<'b> FnMut(&'b T) -> RcDoc<'b, ()> {
move |v| doc_display_pass(v, config)
}

pub(crate) fn doc_create_source<T: AstInfo>(
v: &CreateSourceStatement<T>,
config: PrettyConfig,
) -> RcDoc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the PR needs to introduce a bunch of complexity to work around the existing code structure. It strikes me as potentially the wrong approach: Why don't we convert the freestanding functions to functions on a type instead? That way we wouldn't need to pass the config everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll probably do this. I don't think it will really reduce the complexity, because the self parameter will just take the place of the current config parameter, but should help readability a bit.

(But first, I'd like to resolve the question of whether we even need to plumb FormatMode through the sql-pretty crate. If not, then most of the changes to sql-pretty can simply be reverted.)

Copy link
Contributor Author

@ggevay ggevay Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this now in the "Refactor sql-pretty: pass around PrettyConfig as &self." commit. Passes the config in &self.

(The commit also removed the ..._mapper functions. I just have closures everywhere now.)

@@ -3838,7 +3838,7 @@ pub static MZ_CATALOG_BUILTINS: LazyLock<BTreeMap<&'static str, Func>> = LazyLoc
"pretty_sql" => Scalar {
params!(String, Int32) => BinaryFunc::PrettySql => String, oid::FUNC_PRETTY_SQL;
params!(String) => Operation::unary(|_ecx, s| {
let width = HirScalarExpr::literal(Datum::Int32(100), ScalarType::Int32);
let width = HirScalarExpr::literal(Datum::Int32(mz_sql_pretty::DEFAULT_WIDTH.try_into().expect("must fit")), ScalarType::Int32);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make the constant a 32-bit number, which should be plenty enough. Then you can do cheap up conversion and don't have to unwrap here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, will do!

Copy link
Contributor Author

@ggevay ggevay Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, unfortunately it's not so simple, because then the other use of DEFAULT_WIDTH, which gives it to PrettyConfig::width, would need to convert from i32 to usize. I could make PrettyConfig::width also an i32, but all these widths being signed types would look weird.

I think the original problem is that the pretty_sql scalar function takes a signed width. We could also change that, but changing the parameter types of an existing scalar function is probably more hassle than this is worth.

So, I'd like to just stay with these widths being unsigned types (as they should be, as it width can't be negative), and just work around the problem that pretty_sql takes an Int32 by doing a conversion only in this one spot where I'm giving the default width to this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also make it a u16. But I'm ambivalent about whether or not it makes sense to choose a slightly "weird" type to avoid an expect.

@ggevay ggevay force-pushed the show-create-pretty branch 4 times, most recently from 7945e0c to 592ff6a Compare March 31, 2025 14:03
@ggevay
Copy link
Contributor Author

ggevay commented Mar 31, 2025

The "Refactor sql-pretty: pass around PrettyConfig as &self." commit did the refactoring that @teskje and @antiguru suggested.

The only remaining open question is whether we even need to pass around the config. I think we do, see here.

@ggevay ggevay force-pushed the show-create-pretty branch from 592ff6a to 4c52447 Compare March 31, 2025 16:08
@ggevay ggevay force-pushed the show-create-pretty branch from 4c52447 to 63444e2 Compare April 1, 2025 13:38
@ggevay ggevay force-pushed the show-create-pretty branch from 63444e2 to 5ab09a6 Compare April 1, 2025 15:23
@ggevay ggevay force-pushed the show-create-pretty branch from 5ab09a6 to 2f3f888 Compare April 1, 2025 16:06
@ggevay ggevay merged commit 4a98668 into MaterializeInc:main Apr 1, 2025
219 of 249 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ADAPTER Topics related to the ADAPTER layer lts-backport-v25.1 Needs to be backported into the v25.1 LTS release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants