Skip to content

Example for SqlExecutor::logical_optimizer to avoid unsupported arrow functions #129

@ahirner

Description

@ahirner

Sqlite doesn't support arrow_cast. Given a sqlite table with timestamp in milliseconds:

CREATE TABLE events (
  published INTEGER DEFAULT (CAST(unixepoch('subsec') * 1000 as INTEGER)) NOT NULL,
  sequence_id INTEGER PRIMARY KEY AUTOINCREMENT,
  message BLOB NOT NULL
) STRICT;

This query on the corresponding FederatedTableSource fails:

select arrow_cast(max(published), 'Timestamp(Millisecond, None)') from events limit 1;

The error is:

Error("Failed to convert query result to Arrow: Failed to extract row value: Invalid column type Integer at index: 0, name: arrow_cast(max(sqlite.events.published),Utf8("Timestamp(Millisecond, None)"))")

The sql in SQLExecutor::execute is:

SELECT CAST(max(`events`.`published`) AS TIMESTAMP) AS `arrow_cast(max(events.published),Utf8(\"Timestamp(Millisecond, None)\"))` FROM `events` LIMIT 1

Solution

#117 added functionality to correct too optimistic federation:

Extends SqlExecutor trait with logical_optimizer method, this can allow executor to hook into federation planning, allowing for rewriting LogicalPlan and even placement of FederationPlanNode. This is useful for avoiding federating nodes that are only part of datafusion eg. UDF, UDAF.. etc.

Maybe there are other ways to avoid the cast, but they seem less robust than rules specific to the sql dialect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions