Robust `:and` parser, add `:andn` #1182

frenchy64 · 2025-03-28T21:41:38Z

This tightens up -parser for :and in several ways.

The essential insight is that there are two kinds of parsers, I'm calling transforming (e.g., :orn, -collection-schema) and simple (e.g., :any, -simple-schema). Simple parsers return identical input on success. Everything else is transforming.

In all cases I've seen so far, it's possible to accurately predict whether a parser is simple based on its schema. With this information, we can now improve :and's parser by:

banning more than one transforming parser per :and
running the transforming parser last
running the transforming unparser first

This automatically handles [:and S [:fn ..]] and makes it more robust, as :fn is now passed the input value instead of the parsed value and the conjuncts can be in any order.

Extras:

Adds a new schema :andn for when you really want multiple transforming parsers in a conjunction. It reparses the input for each conjunct and returns in a Tags. Unparser only unparses the leftmost child, which enables users to transform the unparsed results by removing the other results.

We can now more aggressively optimize simple (un)parsers upfront to not build a result when it will be identical to the input.

Includes a fix for #1173 by bumping up the :max-tries for generating distinct vectors.

opqdonut

Some questions / comments. Not confident enough yet to approve.

README.md

opqdonut · 2025-04-01T04:45:07Z

README.md

+;       :flat [#malli.core.Tag{:key :name, :value "x"}
+;              #malli.core.Tag{:key :id, :value 1}
+;              #malli.core.Tag{:key :name, :value "y"}
+;              #malli.core.Tag{:key :id, :value 2}]}}


yeah, this behaviour makes sense, I get it

opqdonut · 2025-04-01T04:59:46Z

src/malli/core.cljc

+      (if (-ref-schema? this)
+        (-parser-info (-deref this))
+        (when (-> this -parent -type-properties ::simple-parser)
+          {:simple-parser true}))))


what's the need for both ParserInfo and -type-properties ::simple-parser?

Could get away with just ParserInfo. It seemed neater at the time to have it at the type-level for trivial types.

Since all the schemas are supposed to have -parser-info, we feel like the method should be in Schema. Also, we feel like -type-properties ::simple-parser makes the feature harder to understand, so we'd prefer you drop that.

The default impl for ParserInfo is neat, but makes following the logic harder for future maintainers.

Also, we feel like -type-properties ::simple-parser makes the feature harder to understand, so we'd prefer you drop that.

Sure, I will remove -type-properties ::simple-parser support. FWIW I was following a similar flexibility to malli.generator/-create, where generators can be provided at the IntoSchema level and overriden by Schema.

Since all the schemas are supposed to have -parser-info, we feel like the method should be in Schema.

I think this would be a mistake. It would introduce a large chance of dependency hell, where you can't upgrade malli without waiting for all 3rd party schemas to also be updated.

For example, (m/parser [:vector ::3rd-party]) would throw an exception because ::3rd-party doesn't implement -parser-info. Having ParserInfo as a separate protocol solves this particular cause of dependency hell, since you can define a default (you can't with Schema, since ::3rd-party likely already implements it directly).

The default (nil) says a parser is transforming, which at worst will undo some parser optimizations introduced in this PR (i.e., preserving the same perf as before). For the extra pedantic, you could conceivably assert (comp some? m/-parser-info) for every schema in your registry to find these schemas.

One particular kind of dependency hell is still possible though: if a library provides a schema that now throws :malli.core/and-schema-multiple-transforming-parsers. Maybe a default global handler could be provided for this case to give control back to the user.

I hadn't thought about the upgrade path – that's a very good point. Probably best to go with the separate ParserInfo protocol.

opqdonut · 2025-04-01T05:03:17Z

src/malli/swagger.cljc

+(defmethod accept :orn [_ s children _]
+  (let [children (map last children)
+        base (-base s children)]
+    (assoc base :x-anyOf children)))


thanks for adding this missing case!

test/malli/parser_test.cljc

opqdonut · 2025-04-01T05:07:17Z

src/malli/core.cljc

+                                                                   (reduced ::invalid)
+                                                                   (cond-> acc
+                                                                     (not simple) (conj v')))))
+                                                             (if simple x []) x)]


I didn't understand the changes around here. Is there a corresponding test?

Since we know the child has a simple parser, we don't need to rebuild the result, since it means the results of parsing is either ::invalid or x.

The "then" branch of malli.parser/ensure-parser-type tests this. For example, [:vector ::HOLE] with ::HOLE being a simple parsing schema like :any is expected to be a simple parser (expected-simple == true) so any mg/samples we take of it will {un}parse back to the identical sampled value.

aha, this is the optimisation you mention in the PR description, right?

Yes, specifically for :map-of, -collection-schema, and :map's default value.

opqdonut · 2025-04-01T05:08:10Z

Letting @ikitommi have a look as well.

frenchy64 · 2025-04-01T05:46:17Z

Thanks for looking @opqdonut.

opqdonut

Went through this in a session with some people from Metosin. We really like the change, and would like to get it in! Thanks for your effort.

Please move -parser-info to Schema and consider the suggestions we had for :parse.

opqdonut · 2025-05-07T06:30:54Z

README.md

+To opt-out of parsing any further levels of this schema, use the `:parse :none` property.
+
+```clojure
+(m/parse [:and {:parse 0}


This is reserving the top-level :parse key for a specific purpose. Also, "parse" on its own is not very descriptive. We propose using a :parse/ ns (just like :gen/ for example). How about :parse/transforming-child-index or :parse/index or :parse/child?

Agreed. I'll see if a good name comes to me, but other than its length I like :parse/transforming-child-index.

:parse/transforming-child seems ok.

:parse/transforming-child :none :parse/transforming-child 0 :parse/transforming-child 4

Further possible extensions like :last seem to work with this name.

opqdonut · 2025-05-07T06:31:55Z

README.md

+The error `:malli.core/and-schema-multiple-transforming-parsers` is thrown if the transforming
+parser cannot be picked automatically. This usually means that multiple conjuncts
+will transform their input or a false-positive has occurred because the underlying schema
+does not implement `malli.core/ParserInfo`.


How about defaulting to the first transforming parser? That's kind of what we do for json-schema generation etc. Or do you think it would trip up users?

I tried this and it broke existing tests. You'd think first parser would be the obvious choice, but I also found ones where last parser was the correct choice.

Instead, I concentrated on more accurate static analysis of parsers to reduce the frequency of manual overrides. I think it was enough to support all the schemas in the current tests automatically.

For example, there was a schema like [:and [:map ...] <transforming-parser-schema>] in the tests somewhere. By improving the detection of :map (that it's only transforming if any children are), we could automatically and intelligently pick the intended transforming parser.

I also had an eye on future robustness. I thought we should help users from accidentally masking their own parsers.

After writing #1182 (comment) maybe a default handler should be provided so the user can handle 3rd-party schemas that need an explicit transforming child. This would be a tool to avoid dependency hell.

e.g.,

(m/parser S {::default-parser-info-handler (fn [s opts] (when (<question> s) {:parse/transforming-child <decision>})})

That sounds like a good workaround as well.

opqdonut · 2025-05-07T06:38:37Z

src/malli/core.cljc

-          (-set [this key value] (-set-assoc-children this key value)))))))
+          (-set [this key value] (-set-assoc-children this key value))
+          ParserInfo
+          (-parser-info [_] {:simple-parser (every? (comp :simple-parser -parser-info) children)}))))))


[nit] use -comp for perf & cljs bundle size reasons

opqdonut · 2025-05-07T06:43:05Z

src/malli/core.cljc

+      (if (-ref-schema? this)
+        (-parser-info (-deref this))
+        (when (-> this -parent -type-properties ::simple-parser)
+          {:simple-parser true}))))


Since all the schemas are supposed to have -parser-info, we feel like the method should be in Schema. Also, we feel like -type-properties ::simple-parser makes the feature harder to understand, so we'd prefer you drop that.

The default impl for ParserInfo is neat, but makes following the logic harder for future maintainers.

opqdonut · 2025-05-07T06:59:14Z

src/malli/core.cljc

-           (-set [this key value] (-set-entries this key value))))))))
+           (-set [this key value] (-set-entries this key value))
+           ParserInfo
+           (-parser-info [_] {:simple-parser (every? (comp :simple-parser -parser-info peek) (-entry-children entry-parser))})))))))


[nit] -comp here as well

opqdonut · 2025-05-07T06:59:34Z

src/malli/core.cljc

@@ -1170,18 +1307,20 @@
             form (delay (-simple-form parent properties children -form options))
             cache (-create-cache options)
             validate-limits (-validate-limits min max)
+             simple-parser (delay (every? (comp :simple-parser -parser-info) children))


[nit] -comp

opqdonut · 2025-05-07T06:59:49Z

src/malli/core.cljc

-           (-set [this key value] (-set-assoc-children this key value))))))))
+           (-set [this key value] (-set-assoc-children this key value))
+           ParserInfo
+           (-parser-info [_] {:simple-parser (every? (comp :simple-parser -parser-info) children)})))))))


[nit] -comp

frenchy64 added 10 commits March 28, 2025 06:07

add :andn, more flexible :and parsing, parse optimizations

2a4a08e

doc

9ef590c

wip

20036a1

wip

1775750

wip

9cde5b1

wip

0cd88fe

rm

b471e5e

add :andn

964d1cc

move

34b4409

wip

09cbe2f

frenchy64 mentioned this pull request Mar 28, 2025

Non-flowing :and parser #1167

Closed

frenchy64 added 10 commits March 28, 2025 21:59

add test

480ce51

wip

4155dd9

wip

9cde29b

wip

fa46503

wip

28cb8e1

wip

459aa94

move

050c442

wip

361eb5e

fix metosin#1173

d0d45cc

wip

ff56e32

frenchy64 marked this pull request as ready for review March 28, 2025 23:23

frenchy64 changed the title ~~WIP: Robust :and parser, add :andn~~ Robust :and parser, add :andn Mar 28, 2025

frenchy64 requested review from opqdonut and ikitommi March 28, 2025 23:23

opqdonut reviewed Apr 1, 2025

View reviewed changes

frenchy64 added 2 commits April 1, 2025 05:32

fix m/parse examples, better bb explanation

4a9f5ab

fix comment

5943136

opqdonut requested changes May 7, 2025

View reviewed changes

Robust :and parser, add :andn #1182

Are you sure you want to change the base?

Robust :and parser, add :andn #1182

Uh oh!

Conversation

frenchy64 commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opqdonut left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

opqdonut commented Apr 1, 2025

Uh oh!

frenchy64 commented Apr 1, 2025

Uh oh!

opqdonut left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frenchy64 May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frenchy64 May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Robust `:and` parser, add `:andn` #1182

Robust `:and` parser, add `:andn` #1182

frenchy64 commented Mar 28, 2025 •

edited

Loading

frenchy64 May 8, 2025 •

edited

Loading

frenchy64 May 8, 2025 •

edited

Loading