Skip to content

The downward push of the time type can run into problems. #388

@smokeriu

Description

@smokeriu

When clickhosue-server timezone is UTC. And Spark timezone is other like Asia/Shanghai.

ddl example

(
  name string,
  log_time datetime64
)

original data after read

code:

spark.sql("select name, log_time from clickhouse.db.table").show()

data:

name log_time
animi 2025-03-21 09:44:21
animi 2025-03-21 09:02:40

when passing where condition on time column:

code:

spark.sql("select log_time from clickhouse.db.table")
                .where("log_time > '2025-03-21 09:00:00'");

And I also test:

spark.sql("select log_time from clickhouse.db.table")
                .where("log_time > to_timestamp('2025-03-21 09:00:00')");

return data is empty
return data is empty

The log show:

Pushed Filters: GreaterThan(log_time,2025-03-21 09:00:00.0)

I think connector will compile query like:

select log_time from clickhouse.db.table where  log_time > '2025-03-21 09:00:00'

Clickhouse should use toDateTime() function when filter on datetime column.

I think that when complie filtering, it needs to be handled by function wrapping the datetime type of the pushdown filter, instead of just treating it as a string as it is now.

But it's not clear to me that there are any other problems with hard-coding functions directly this way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions