Skip to content

[FEATURE] Evaluate semantic compatibility between PPL in Spark and OpenSearch #1208

Open
@dai-chen

Description

@dai-chen

Is your feature request related to a problem?

Yes. Although PPL queries in both Spark and OpenSearch will be handled through the shared PPL-Calcite frontend in the near future, we lack validation to ensure their semantics are consistent across both backends. Since Calcite plans are translated to SparkSQL, we assume semantic parity in common SQL operators — however, potential differences in type systems, function behavior, or error handling may still lead to inconsistencies.

What solution would you like?

Design and execute a test suite to evaluate semantic compatibility between PPL queries running in Spark by Legacy PPL Spark (current implementation) and the new PPL Spark via Calcite.

Goal

The outcome of this task will be a documented set of compatibility findings that serve as input for Unify UDT/UDF/UDAFs (TODO: create Github issue).

Deliverable

The outcome of this task will be a documented set of compatibility findings that serve as input for TODO.

  • A list of standard PPL functions whose behavior differs in SparkSQL (e.g., semantics, return type, null handling).
  • Identification of missing functions in SparkSQL that are supported in OpenSearch.
  • Notes on whether it requires user-defined type (UDT) support in SparkSQL to enable those functions.

Tasks

  • Create a test plan outlining the goal, scope and expectations.
  • Leverage the test framework developed in piped-processing-language#32 or define a standalone test suite consisting of representative PPL queries, execute them against both Spark and OpenSearch backends, and compare the results to identify any inconsistencies in output.

What alternatives have you considered?

N/A

Do you have any additional context?

  • Functions dependent on OpenSearch-specific UDTs, such as:
    • IP-related functions
    • Geo-point functions
    • Other domain-specific types not supported in Spark

These will be evaluated in a future phase for OpenSearch-specific unification.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions