Skip to content

Correct and optimize setof/3 and bagof/3#3211

Merged
mthom merged 12 commits intomasterfrom
setof_correction_and_opt
Jan 15, 2026
Merged

Correct and optimize setof/3 and bagof/3#3211
mthom merged 12 commits intomasterfrom
setof_correction_and_opt

Conversation

@mthom
Copy link
Owner

@mthom mthom commented Dec 14, 2025

Use variant hashing and equivalence checking in Rust to replace the special purpose constant variable keysort, which produced incorrect solutions for #3151 and #3187 as @jjtolton showed in #3176. Performance is now much faster, a factor of O(N) in the solution size (#3186).

A variable safety bug was exposed in the course of implementing these corrections, which this PR also corrects.

@triska
Copy link
Contributor

triska commented Dec 14, 2025

There is also #1526, is this related to setof/3 too?

// to h1 produces h2 (ISO Prolog standard section 7.1.6.1).
// return false on success and true on failure like eq_test.
#[inline(always)]
pub fn is_non_variant(&self, h1: HeapCellValue, h2: HeapCellValue) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://stackoverflow.com/questions/26900414/difference-between-two-variant-implementations, variant/2 is:

variant(X, Y) :-
   copy_term(Y, YC),
   subsumes_term(X, YC),
   subsumes_term(YC, X).

Maybe this can be used instead?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variant_disjoint(X, Y) :-
   term_variables(X, XVars), append(XVars, _, Dict),
   term_variables(Y, YVars), append(YVars, _, Dict),
   X == Y.

For bagof/3 and setof/3, the first goals are to be executed prior to any comparisons.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key idea: That is, first all solutions get the very same variables (but this only works if these constraints are not copied around), and then == is good enough. No need for any hashing and the like.

Copy link

@UWN UWN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?- setof(t,(dif(X,a);dif(X,b)),_).
   dif:dif(X,a), dif:dif(X,b), unexpected.

dif#26

It does not make sense to retain constraints in setof/3

@UWN
Copy link

UWN commented Dec 14, 2025

A general comment on variant and the like. In this context here, there is no need for full variant functionality. In fact, all of this is not necessary. (==)/2 suffices. Note that after the findall/3, all elements (PairedSolutions) contain disjoint variables. And they should also not contain any attributed variables. Now, it is possible to set all variables equal, just according to the order from term_variables/2.
And thus, with a single sort, setof/3 eliminates all duplicates in one fell swoop whereas for bagof/3 one has to keep the duplicates.

@mthom
Copy link
Owner Author

mthom commented Dec 31, 2025

There is also #1526, is this related to setof/3 too?

no, it's because findall copies solutions from a failure-driven loop. The resulting backtracking clears the stack of environment frames so the cont-based mechanism of library(tabling) is left with nothing to latch onto.

@mthom
Copy link
Owner Author

mthom commented Dec 31, 2025

A general comment on variant and the like. In this context here, there is no need for full variant functionality. In fact, all of this is not necessary. (==)/2 suffices. Note that after the findall/3, all elements (PairedSolutions) contain disjoint variables. And they should also not contain any attributed variables. Now, it is possible to set all variables equal, just according to the order from term_variables/2. And thus, with a single sort, setof/3 eliminates all duplicates in one fell swoop whereas for bagof/3 one has to keep the duplicates.

It's unclear to me how bagof/3 should be revised in the light of the above comments. While setof/3 is bootstrapped from bagof/3, bagof/3 doesn't need to rely on variant predicates either, or should it instead use either variant/2 or variant_disjoint/2 above?

@UWN
Copy link

UWN commented Dec 31, 2025

In an ideal implementation, setof/3 does not use bagof/3 since bagof/3 does not remove duplicates early on. Duplicates should be removed as early as possible. That's one aspect.

@UWN
Copy link

UWN commented Dec 31, 2025

Otherwise bagof/3 could use the very same approach. That is, after the findall/3 unify all related variables, and only then sort for the alternate solutions for bagof/3.

@mthom
Copy link
Owner Author

mthom commented Jan 6, 2026

I can't find a way to replace is_non_variant in Prolog without making both predicates much, much slower. For performance's sake it seems unavoidable. Unfortunately variants don't correspond to any form of sort order. If they did it could be done in O(N) in pure Prolog.

@Skgland
Copy link
Contributor

Skgland commented Jan 6, 2026

I can't find a way to replace is_non_variant in Prolog without making both predicates much, much slower. For performance's sake it seems unavoidable. Unfortunately variants don't correspond to any form of sort order. If they did it could be done in O(N) in pure Prolog.

Would it work to map each variable to a term representing the position in the term that variable first occurs at, so that we then have a ground term and can use the sort order of ground terms? Or what am I missing that wouldn't make this work?

i.e. use a reserved functor (so that it can't occur in any of the terms e.g. $pos/1
and for a variable at

  • the root use []
  • the A-st argument of the root use [A]
  • the B-st argument of the A-st argument of the root term use [B, A]
  • the C-st argument of the B-st argument of the A-st argument of the root term use [C, B, A]
  • etc.

i.e. assuming the functor $pos doesn't already occur in the term

  • X -> $pos([])
  • [] -> []
  • t(X, y(Y)) -> t($pos([0]), y($pos[0,1]))
  • t([0], y([1])) -> t([0], y([1]))
  • t(A, A, B) -> t($pos([0]), $pos([0]), $pos([1]))
  • t(A, B, B) -> t($pos([0]), $pos([1]), $pos([1]))
  • t(X, Y, Y) -> t($pos([0]), $pos([1]), $pos([1]))

@UWN
Copy link

UWN commented Jan 7, 2026

Would it work to map each variable to a term representing the position in the term that variable first occurs at, so that we then have a ground term

This is overkill. It is only necessary that the terms contain the very same variables, then regular == and sorting works. As long as a special mechanism is now developed for the sole purpose of bagof/3, testing it will become practically impossible. Just for comparison, it took about a year to implement variant checking in SWI (there called misleadingly =@=) with many, many failed attempts. But at least, it was possible to test it. When such a complex mechanism is exclusively accessible via bagof/3, the chances of identifying such errors diminish rapidly.

@UWN
Copy link

UWN commented Jan 7, 2026

Hashing is also very challenging in the presence of rational trees (the default in Scryer). It means that even more cases of rational trees disclose their internal representation than those exposed by compare-ison.

ulrich@gupu:/opt/gupu/setof_correction_and_opt$ git status
On branch setof_correction_and_opt
Your branch is up to date with 'origin/setof_correction_and_opt'.

nothing to commit, working tree clean
ulrich@gupu:/opt/gupu/setof_correction_and_opt$ target/release/scryer-prolog -v
v0.10.0-60-gaa2d04cf
ulrich@gupu:/opt/gupu/setof_correction_and_opt$ target/release/scryer-prolog -f
?- setof(t,X^(-X=X,(Y = X ; Y = -X ; Y = - -X)),Ts).
   Y = -Y, Ts = "t"
;  Y = - - - - - - - - - - - - - - - - - - - - ..., Ts = "t", unexpected
;  Y = - - - - - - - - - - - - - - - - - - - - ..., Ts = "t", unexpected.
?- setof(Y,X^(-X=X,(Y = X ; Y = -X ; Y = - -X)),Ys).
   Ys = [- - - - - - - - - - - - - - - - - - - ...]. % fine, shows that comparison works

@UWN
Copy link

UWN commented Jan 7, 2026

Or more explicitly,

?- bagof(t,X^(-X=X,(Y = X ; Y = -X ; Y = - -X)),[t,t,t]).
   false, unexpected.
   sto,
   Y = -Y.

@mthom mthom force-pushed the setof_correction_and_opt branch from aa2d04c to 9221a6a Compare January 8, 2026 08:30
@UWN
Copy link

UWN commented Jan 8, 2026

sort_without_dedup/2: commonly called keysort/2

@UWN
Copy link

UWN commented Jan 8, 2026

And sort_without_dedup/2 is incorrect, which can be seen:

?- bagof(Y,(Y=2;Y=1),Ys).
   Ys = [1,2], unexpected.
   Ys = [2,1].

@UWN
Copy link

UWN commented Jan 8, 2026

Just one more remark on keysort/2: If an identical key is found, one may destructively point from one pair/key to the other (provided no cp is in between), such that the subsequent processing gets faster. Similarly for sort/2.

Cargo.toml Outdated
serde_json = "1.0.122"
serde = "1.0.204"
parking_lot = "0.12.4"
hashbrown = "0.16.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this now still needed?

Copy link

@UWN UWN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress!

What I am still very suspicious about are these two sort/2 s and this set_difference. We have term_variables and append for this! Witnesses0 is already unique, so why sort it? You will just introduce some evil implementation dependence.

Copy link

@UWN UWN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If might be a good idea to share more of the identical variable analysis in one common auxiliary predicate, but that is nit-picking)

@mthom mthom force-pushed the setof_correction_and_opt branch from 695c09e to c79fd74 Compare January 15, 2026 04:43
@mthom mthom merged commit 453a88f into master Jan 15, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants