Skip to content

Robust :and parser, add :andn #1182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

frenchy64
Copy link
Collaborator

@frenchy64 frenchy64 commented Mar 28, 2025

Close #1166
Close #1173

This tightens up -parser for :and in several ways.

The essential insight is that there are two kinds of parsers, I'm calling transforming (e.g., :orn, -collection-schema) and simple (e.g., :any, -simple-schema). Simple parsers return identical input on success. Everything else is transforming.

In all cases I've seen so far, it's possible to accurately predict whether a parser is simple based on its schema. With this information, we can now improve :and's parser by:

  1. banning more than one transforming parser per :and
  2. running the transforming parser last
  3. running the transforming unparser first

This automatically handles [:and S [:fn ..]] and makes it more robust, as :fn is now passed the input value instead of the parsed value and the conjuncts can be in any order.

Extras:

Adds a new schema :andn for when you really want multiple transforming parsers in a conjunction. It reparses the input for each conjunct and returns in a Tags. Unparser only unparses the leftmost child, which enables users to transform the unparsed results by removing the other results.

We can now more aggressively optimize simple (un)parsers upfront to not build a result when it will be identical to the input.

Includes a fix for #1173 by bumping up the :max-tries for generating distinct vectors.

@frenchy64 frenchy64 marked this pull request as ready for review March 28, 2025 23:23
@frenchy64 frenchy64 changed the title WIP: Robust :and parser, add :andn Robust :and parser, add :andn Mar 28, 2025
@frenchy64 frenchy64 requested review from opqdonut and ikitommi March 28, 2025 23:23
Copy link
Member

@opqdonut opqdonut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions / comments. Not confident enough yet to approve.

; :flat [#malli.core.Tag{:key :name, :value "x"}
; #malli.core.Tag{:key :id, :value 1}
; #malli.core.Tag{:key :name, :value "y"}
; #malli.core.Tag{:key :id, :value 2}]}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this behaviour makes sense, I get it

(if (-ref-schema? this)
(-parser-info (-deref this))
(when (-> this -parent -type-properties ::simple-parser)
{:simple-parser true}))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the need for both ParserInfo and -type-properties ::simple-parser?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could get away with just ParserInfo. It seemed neater at the time to have it at the type-level for trivial types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the schemas are supposed to have -parser-info, we feel like the method should be in Schema. Also, we feel like -type-properties ::simple-parser makes the feature harder to understand, so we'd prefer you drop that.

The default impl for ParserInfo is neat, but makes following the logic harder for future maintainers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we feel like -type-properties ::simple-parser makes the feature harder to understand, so we'd prefer you drop that.

Sure, I will remove -type-properties ::simple-parser support. FWIW I was following a similar flexibility to malli.generator/-create, where generators can be provided at the IntoSchema level and overriden by Schema.

Since all the schemas are supposed to have -parser-info, we feel like the method should be in Schema.

I think this would be a mistake. It would introduce a large chance of dependency hell, where you can't upgrade malli without waiting for all 3rd party schemas to also be updated.

For example, (m/parser [:vector ::3rd-party]) would throw an exception because ::3rd-party doesn't implement -parser-info. Having ParserInfo as a separate protocol solves this particular cause of dependency hell, since you can define a default (you can't with Schema, since ::3rd-party likely already implements it directly).

The default (nil) says a parser is transforming, which at worst will undo some parser optimizations introduced in this PR (i.e., preserving the same perf as before). For the extra pedantic, you could conceivably assert (comp some? m/-parser-info) for every schema in your registry to find these schemas.

One particular kind of dependency hell is still possible though: if a library provides a schema that now throws :malli.core/and-schema-multiple-transforming-parsers. Maybe a default global handler could be provided for this case to give control back to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't thought about the upgrade path – that's a very good point. Probably best to go with the separate ParserInfo protocol.

(defmethod accept :orn [_ s children _]
(let [children (map last children)
base (-base s children)]
(assoc base :x-anyOf children)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for adding this missing case!

(reduced ::invalid)
(cond-> acc
(not simple) (conj v')))))
(if simple x []) x)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand the changes around here. Is there a corresponding test?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the child has a simple parser, we don't need to rebuild the result, since it means the results of parsing is either ::invalid or x.

The "then" branch of malli.parser/ensure-parser-type tests this. For example, [:vector ::HOLE] with ::HOLE being a simple parsing schema like :any is expected to be a simple parser (expected-simple == true) so any mg/samples we take of it will {un}parse back to the identical sampled value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, this is the optimisation you mention in the PR description, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, specifically for :map-of, -collection-schema, and :map's default value.

@opqdonut
Copy link
Member

opqdonut commented Apr 1, 2025

Letting @ikitommi have a look as well.

@frenchy64
Copy link
Collaborator Author

Thanks for looking @opqdonut.

Copy link
Member

@opqdonut opqdonut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through this in a session with some people from Metosin. We really like the change, and would like to get it in! Thanks for your effort.

Please move -parser-info to Schema and consider the suggestions we had for :parse.

To opt-out of parsing any further levels of this schema, use the `:parse :none` property.

```clojure
(m/parse [:and {:parse 0}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reserving the top-level :parse key for a specific purpose. Also, "parse" on its own is not very descriptive. We propose using a :parse/ ns (just like :gen/ for example). How about :parse/transforming-child-index or :parse/index or :parse/child?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll see if a good name comes to me, but other than its length I like :parse/transforming-child-index.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:parse/transforming-child seems ok.

:parse/transforming-child :none
:parse/transforming-child 0
:parse/transforming-child 4

Further possible extensions like :last seem to work with this name.

The error `:malli.core/and-schema-multiple-transforming-parsers` is thrown if the transforming
parser cannot be picked automatically. This usually means that multiple conjuncts
will transform their input or a false-positive has occurred because the underlying schema
does not implement `malli.core/ParserInfo`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about defaulting to the first transforming parser? That's kind of what we do for json-schema generation etc. Or do you think it would trip up users?

Copy link
Collaborator Author

@frenchy64 frenchy64 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and it broke existing tests. You'd think first parser would be the obvious choice, but I also found ones where last parser was the correct choice.

Instead, I concentrated on more accurate static analysis of parsers to reduce the frequency of manual overrides. I think it was enough to support all the schemas in the current tests automatically.

For example, there was a schema like [:and [:map ...] <transforming-parser-schema>] in the tests somewhere. By improving the detection of :map (that it's only transforming if any children are), we could automatically and intelligently pick the intended transforming parser.

I also had an eye on future robustness. I thought we should help users from accidentally masking their own parsers.

Copy link
Collaborator Author

@frenchy64 frenchy64 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing #1182 (comment) maybe a default handler should be provided so the user can handle 3rd-party schemas that need an explicit transforming child. This would be a tool to avoid dependency hell.

e.g.,

(m/parser S {::default-parser-info-handler
             (fn [s opts]
               (when (<question> s)
                 {:parse/transforming-child <decision>})})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good workaround as well.

(-set [this key value] (-set-assoc-children this key value)))))))
(-set [this key value] (-set-assoc-children this key value))
ParserInfo
(-parser-info [_] {:simple-parser (every? (comp :simple-parser -parser-info) children)}))))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] use -comp for perf & cljs bundle size reasons

(if (-ref-schema? this)
(-parser-info (-deref this))
(when (-> this -parent -type-properties ::simple-parser)
{:simple-parser true}))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the schemas are supposed to have -parser-info, we feel like the method should be in Schema. Also, we feel like -type-properties ::simple-parser makes the feature harder to understand, so we'd prefer you drop that.

The default impl for ParserInfo is neat, but makes following the logic harder for future maintainers.

(-set [this key value] (-set-entries this key value))))))))
(-set [this key value] (-set-entries this key value))
ParserInfo
(-parser-info [_] {:simple-parser (every? (comp :simple-parser -parser-info peek) (-entry-children entry-parser))})))))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] -comp here as well

@@ -1170,18 +1307,20 @@
form (delay (-simple-form parent properties children -form options))
cache (-create-cache options)
validate-limits (-validate-limits min max)
simple-parser (delay (every? (comp :simple-parser -parser-info) children))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] -comp

(-set [this key value] (-set-assoc-children this key value))))))))
(-set [this key value] (-set-assoc-children this key value))
ParserInfo
(-parser-info [_] {:simple-parser (every? (comp :simple-parser -parser-info) children)})))))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] -comp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants