-
Notifications
You must be signed in to change notification settings - Fork 558
[GLUTEN-9801] Clean up Write Files commit Protocol logic #9844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@ulysses-you Can you help to review this PR? Thanks. |
40f2dd8
to
a538513
Compare
@RushabhK, could you have a try again? |
@JkSelf just curious, why we can't let spark handle this for Gluten? |
@@ -119,6 +122,8 @@ class VeloxColumnarWriteFilesRDD( | |||
val targetFileName = fileWriteInfo.targetFileName | |||
val outputPath = description.path | |||
|
|||
fileNames += targetFileName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems writePath
does not contain partition partitionFragment ? we should collect partitionFragment + "/" + targetFileName
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you Yes. Updated.
a538513
to
5a8d954
Compare
backends-velox/src/main/scala/org/apache/spark/sql/execution/VeloxColumnarWriteFilesExec.scala
Outdated
Show resolved
Hide resolved
@JkSelf I tested this change on my setup. It's still giving the same exception, is not a Parquet file. Expected magic number at tail, but found [2, 0, 0, 0]. This file is ~250 MB size. This is the complete stack trace:
|
@RushabhK Ok. Can you help to provide the reproduced code? Thanks. |
@JkSelf I can elaborate how I am testing this in the following steps
|
@RushabhK I followed the steps you provided using the main branch code, and the test code is as follows:
Here is the Spark history UI for reference. However, I was unable to reproduce the issue you mentioned. Am I missing something? |
@JkSelf I am running this on Ubuntu, what OS are you running it on? Can you share the setup for your spark image? |
Also @JkSelf I am writing the parquet in overwrite mode. I am using the following code for write:
|
@JkSelf I added some logs for better visibility around what all files the abortTask is deleting.
So is the code in the catchBlock suggests 0 files: https://github.com/RushabhK/incubator-gluten/blob/v1.3.0-fixes/backends-velox/src/main/scala/org/apache/spark/sql/execution/VeloxColumnarWriteFilesExec.scala#L244
I had added more logs to check the fileNames status at every point. This suggest the fileNames size to be 1 in all the logs: https://github.com/RushabhK/incubator-gluten/blob/v1.3.0-fixes/backends-velox/src/main/scala/org/apache/spark/sql/execution/VeloxColumnarWriteFilesExec.scala#L139
The problem is 0 files are being collected while calling the abortTask. This is the issue which needs to be addressed, this is why it's not deleting any files when the abort task is being called. |
@RushabhK My OS is Ubuntu 20.04.6 and spark version is 3.5.2.
|
Are you saying that |
Yes, |
5a8d954
to
fb9ec0c
Compare
fb9ec0c
to
e14720b
Compare
e14720b
to
663107b
Compare
@RushabhK Could you help to try the latest commit again? Thanks. |
8694073
to
7ce2fe2
Compare
…ark's FileCommitProtocol
What changes were proposed in this pull request?
Velox supports third-party integration for passing the fileName, eliminating the need to define SparkWriteFilesCommitProtocol in Gluten for managing the write task commit process. Instead, Spark's FileCommitProtocol can be used directly.
With this PR, the file written by the failed task can be deleted in abortTask().
Fix #9801 (comment)
How was this patch tested?
Existing unit tests