-
Notifications
You must be signed in to change notification settings - Fork 113
Open
Description
Search before asking
- I had searched in the issues and found no similar issues.
Version
spark-doris-connector: 25.0.1
doris: 3.0.0
spark: 3.0.1
What's Wrong?
val dorisTableIdentifier = "doris_db.doris_table"
val hiveTableName = "hive_db.hive_table"
val timeColumn = "ctime"
val selectedColumnsStr = args(5).trim
val startTime = "2025-05-06 00:00:00"
val endTime = "2025-05-07 00:00:00"
val appName = s"doris-to-hive-$hiveTableName"
val spark = SparkSession.builder()
.appName(appName)
.enableHiveSupport()
.getOrCreate()
// 1. read data from doris
val dorisDF = spark.read
.format("doris")
.option("doris.fenodes", feNodes)
.option("doris.table.identifier", dorisTableIdentifier)
.option("user", user)
.option("password", password)
.load()
.filter(col(timeColumn) >= lit(startTime) && col(timeColumn) < lit(endTime)) // limit timespan
.select(selectedColumns.map(col): _*) // select columns
log.info("doris data count: {}", dorisDF.count())
Thread.sleep(1000)
log.info("doris data count: {}", dorisDF.count())
Thread.sleep(5000)
log.info("doris data count: {}", dorisDF.count())
dorisDF.createOrReplaceTempView("doris_data_detail")
// 2. write to hive
val insertSql =
s"""
|INSERT OVERWRITE TABLE $hiveTableName PARTITION (pt='20250410000000')
|SELECT
|$selectedColumnsStr
|FROM doris_data_detail
|""".stripMargin
log.info("insert hive sql: {}", insertSql)
spark.sql(insertSql)
spark.stop()
I used this code to implement doris2hive, and I found that the amount of data in the hive table was smaller than that in the doris table, so I added some logs to record the number of dataframes. The log is as follows:
25/05/07 19:43:56 INFO Doris2HiveTask$: doris data count: 68684
25/05/07 19:43:59 INFO Doris2HiveTask$: doris data count: 97918
25/05/07 19:44:05 INFO Doris2HiveTask$: doris data count: 99903
the amount in doris:
I am certain that the data count of the doris table has not changed during this period.
Why did this happen , is this a bug?
What You Expected?
The reason for this situation
How to Reproduce?
No response
Anything Else?
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
No labels