Skip to content

Spark with Sqoop and Kite - Mismatch in Command? #490

@dovy

Description

@dovy

Trying to dig into this one. When Sqoop is used without Kite (IE, no parquet) there are no issues. The moment the job runs to export to parquet, everything blows up. It seems like Kite may be the offender, but if you have somewhere else to point me I will gladly work upstream.

System:

  • Debian 9
  • Hadoop 2.9
  • Spark 2.3

Installed Dependencies (JARs):

  • sqoop-1.4.7-hadoop260
  • kite-data-mapreduce-1.1.0
  • kite-hadoop-compatibility-1.1.0.jar
  • kite-data-crunch-1.1.0
  • kite-data-core-1.1.0
  • avro-tools-1.8.2.jar
  • mysql-connector-java-5.1.42
  • parquet-tools-1.8.3

Error:

19/07/09 17:55:28 INFO mapreduce.Job: Job job_1562682312457_0020 failed with state FAILED due to: Job setup failed : java.lang.IllegalArgumentException: Parquet only supports generic and specific data models, type parameter must implement IndexedRecord
	at org.kitesdk.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:96)
	at org.kitesdk.data.spi.filesystem.FileSystemDataset.<init>(FileSystemDataset.java:128)
	at org.kitesdk.data.spi.filesystem.FileSystemDataset$Builder.build(FileSystemDataset.java:687)
	at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:199)
	at org.kitesdk.data.Datasets.load(Datasets.java:108)
	at org.kitesdk.data.Datasets.load(Datasets.java:165)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.load(DatasetKeyOutputFormat.java:542)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.loadOrCreateJobDataset(DatasetKeyOutputFormat.java:569)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat.access$300(DatasetKeyOutputFormat.java:67)
	at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.setupJob(DatasetKeyOutputFormat.java:369)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
	at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


19/07/09 17:55:28 INFO mapreduce.Job: Counters: 2

Again, it only fails on the final conversion. I am not sure of the full details since the command is inside a parallel process. Any direction would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions