Skip to content

Conversation

@Nithurshen
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Substrait roundtrip tests were failing for queries involving EXISTS and correlated subqueries (specifically those using outer references). The DataFusion Substrait producer did not support serializing Expr::Exists, and it threw a "feature not implemented" error for OuterReferenceColumn.

Supporting these features is essential for improving the reliability of DataFusion's Substrait integration and enabling complex join scenarios in distributed environments.

What changes are included in this PR?

  1. Producer Support for EXISTS:

    • Implemented from_exists in the producer to map Expr::Exists to the Substrait SetPredicate (using PredicateOp::Existence).
    • Added support for NOT EXISTS by wrapping the predicate in a "not" scalar function.
  2. Producer Support for OuterReferenceColumn:

    • Updated the producer to serialize OuterReferenceColumn as a custom scalar function named "outer_reference". This avoids schema validation errors during serialization since outer columns often cannot be resolved against the local subquery schema.
  3. Consumer Support for OuterReferenceColumn:

    • Updated the consumer to intercept the "outer_reference" scalar function and deserialize it back into a DataFusion OuterReferenceColumn, ensuring a successful roundtrip.

Are these changes tested?

Yes.

  • Verified using the existing SQLLogicTest case that was previously failing: joins.slt.
  • Command run: cargo test --test sqllogictests -- --substrait-round-trip joins.slt:1233

Are there any user-facing changes?

No breaking API changes. This PR purely expands the coverage of supported Logical Plans that can be converted to/from Substrait.

@github-actions github-actions bot added the substrait Changes to the substrait crate label Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Cannot convert Exists { subquery: <subquery>, negated: true } to Substrait

1 participant