-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
What went wrong?
Ran several times a SQL aggregate query using sampling and I got different results
How to reproduce?
Just follow the tutorial from the doc
1. Code that triggered the bug, or steps to reproduce:
val ecommerce = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("ecommerce100K_2019_Oct.csv")
val qbeastTablePath = "table/qbeast-test-data/qtable"
(ecommerce.write.mode("overwrite").format("qbeast").option("columnsToIndex", "user_id,product_id").option("cubeSize", "500").save(qbeastTablePath))
val qbeastDf = (spark.read.format("qbeast").load(qbeastTablePath))
qbeastDf.sample(0.1).agg(avg("price")).show()
ecommerce.createOrReplaceTempView("ecommerce_october")
spark.sql("SELECT avg(price) FROM ecommerce_october TABLESAMPLE(10 PERCENT)").show()
2. Branch and commit id:
--packages io.qbeast:qbeast-spark_2.12:0.6.0
3. Spark version:
res0: String = 3.5.0
4. Hadoop version:
res1: String = 3.3.4
.
5. How are you running Spark?
Local computer using nix shell
6. Stack trace:
No error output
Metadata
Metadata
Assignees
Labels
No labels