Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Tuples & Variants #368

Merged
merged 5 commits into from
May 12, 2024

Conversation

jirislav
Copy link
Contributor

@jirislav jirislav commented Apr 8, 2024

Summary

Related issue: #286

I have implemented the support for Tuple & Variant data types by adding describe_include_subcolumns = 1 setting to the DESCRIBE TABLE query.

As a result, I was able to get rid of the recursive Map parsing (since ClickHouse returns separate rows for the nested types) and unify it with the way how Tuples are now parsed.

Essentially, the Column class is no longer responsible for parsing of the complex data type (except Array, since this is not included in the DESCRIBE TABLE output). To fully parse the complex data type, a context of previous columns has to be present, which is why Table class is responsible for parsing those.

I have added TableTest, feel free to inspect it first to get an idea of how it works.

Regarding the Variant type, I needed this to support Avro's union type. In the Kafka Connect ecosystem, it is represented as a Struct, whose columns are named by the lowercase version of the type.

Please note that I've added the Lombok library as the telescoping constructors problem was too much in the Column class, so I refactored it to use the builder pattern.

Checklist

Delete items not relevant to your PR:

@CLAassistant
Copy link

CLAassistant commented Apr 8, 2024

CLA assistant check
All committers have signed the CLA.

@jirislav
Copy link
Contributor Author

jirislav commented Apr 8, 2024

For anyone interested, I've released the zip here.

@jirislav jirislav force-pushed the finish-complex-types-support branch from c7912e5 to 637ca42 Compare April 8, 2024 10:58
@jirislav
Copy link
Contributor Author

jirislav commented Apr 8, 2024

The following tests are failing due to #367:

  • ClickHouseSinkTaskWithSchemaTest.schemaWithDefaultsTest
  • ClickHouseSinkTaskWithSchemaTest.schemaWithDefaultsAndNullableTest
  • ClickHouseSinkTaskTest.testDBTopicSplit

@mzitnik
Copy link
Collaborator

mzitnik commented Apr 8, 2024

@jirislav Thanks for the contribution, I have disabled those tests in my latest merge tom main can you aline your brach with main and check again if the tests are still falling

@jirislav jirislav force-pushed the finish-complex-types-support branch 4 times, most recently from dd5f75f to ee92764 Compare April 8, 2024 19:37
@jirislav
Copy link
Contributor Author

jirislav commented Apr 8, 2024

@mzitnik Thank you! I have rebased my changes and adjusted the GitHub workflow defintion to run cloud tests only when the secrets are defined.

All tests are passing except this one:

ClickHouseSinkTaskWithSchemaProxyTest > schemaWithDefaultsTest() FAILED
    org.opentest4j.AssertionFailedError: expected: <1000> but was: <500>
        at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
        at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
        at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
        at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)
        at app//com.clickhouse.kafka.connect.sink.ClickHouseSinkTaskWithSchemaProxyTest.schemaWithDefaultsTest(ClickHouseSinkTaskWithSchemaProxyTest.java:283)

I have no idea what's wrong here, I haven't touched anything related (I think).

When I inspect the actual table contents, the off16 column contains only even numbers (0, 2, 4 …). Can you think of anything that could be the issue here? It fails both on my local machine and using GitHub workflows.

@jirislav jirislav force-pushed the finish-complex-types-support branch from ba966a0 to f2ba783 Compare April 8, 2024 20:46
@jirislav
Copy link
Contributor Author

jirislav commented Apr 8, 2024

I have no idea what's wrong here, I haven't touched anything related (I think).

Ok, turns out it was actually my fault. During the refactor, the hasDefaults setting of the Table instance was lost. Now it's fixed and it's working again.

I'm glad you have this test.

All tests are passing now.

@jirislav jirislav force-pushed the finish-complex-types-support branch from f2ba783 to 8adcb9c Compare April 8, 2024 22:03
@jirislav
Copy link
Contributor Author

jirislav commented Apr 8, 2024

It turns out that adding support for Nested type is really easy, so I just did it and released it to the public here (if somebody didn't want to wait for merge):

@jirislav jirislav force-pushed the finish-complex-types-support branch from dea37ab to ed98ef5 Compare April 12, 2024 10:28
Copy link
Contributor

@Paultagoras Paultagoras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing this! We're hoping to refactor serialization in the near-ish future, but this definitely helps cover gaps!

@Paultagoras Paultagoras merged commit cfab3ed into ClickHouse:main May 12, 2024
@jirislav jirislav deleted the finish-complex-types-support branch May 12, 2024 14:14
@Paultagoras Paultagoras linked an issue May 14, 2024 that may be closed by this pull request
@dermasmid
Copy link

im seeing this error:

INFO 2024-05-29T14:39:51.163593458Z java.lang.IllegalArgumentException: DESCRIBE TABLE is never supposed to return Nested type. It should always yield its Array fields directly. at com.clickhouse.kafka.connect.sink.db.mapping.Column.extractColumn(Column.java:204) at com.clickhouse.kafka.connect.sink.db.mapping.Column.extractColumn(Column.java:159) at com.clickhouse.kafka.connect.sink.db.mapping.Column.extractColumn(Column.java:155) at com.clickhouse.kafka.connect.sink.db.helper.ClickHouseHelperClient.describeTable(ClickHouseHelperClient.java:220) at com.clickhouse.kafka.connect.sink.db.helper.ClickHouseHelperClient.extractTablesMapping(ClickHouseHelperClient.java:248) at com.clickhouse.kafka.connect.sink.db.ClickHouseWriter.updateMapping(ClickHouseWriter.java:123) at com.clickhouse.kafka.connect.sink.db.ClickHouseWriter.start(ClickHouseWriter.java:103) at com.clickhouse.kafka.connect.sink.ProxySinkTask.(ProxySinkTask.java:61) at com.clickhouse.kafka.connect.sink.ClickHouseSinkTask.start(ClickHouseSinkTask.java:57) at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:329) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:202) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:259) at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:237) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840)

the logs dont include the table that caused it, so i dont have a repro yet

@jirislav
Copy link
Contributor Author

@dermasmid This has to be specific to the table you're using. I ran the schemaWithNestedTupleMapArrayAndVariant test on the latest ClickHouse version 24.4 without any error.

Note that we don't have the context of the table being parsed within the Column class, so the only way how you can find out what table is causing this is by enabling the DEBUG log level. You should see a preceding DESCRIBE TABLE statement pointing to the problematic table. Please run the DESCRIBE TABLE statement yourself and share the output of it here as well as the version of your ClickHouse installation.

@dermasmid
Copy link

heres the debug log:

Extracting column business_units with type Nested(bu_id String, name Nullable(String), currency Nullable(String), status String, timezone Nullable(String), tw_account_id Nullable(String), integration_ids Array(Nullable(String)), created_at Nullable(DateTime), updated_at Nullable(DateTime)) (com.clickhouse.kafka.connect.sink.db.mapping.Column) [task-thread-sonic-clickhouse-0]

clickhouse version 24.4

@dermasmid
Copy link

note, this column is not from the table i intend to write to, but because its in the database that i set, it seems like to connector is fetching it.
if i moved my table of interest to a clean database, it worked without issue

@dermasmid
Copy link

i opened issue #399

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nested type support
5 participants