-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark rewrite_data_files failing with java.lang.IllegalStateException: Connection pool shut down #12046
Comments
I tried to trace where the connection pool is being closed. Aside from a calls stemming from finalizers on Thread shutdown (which seem perfectly legitimate), I see:
Where I would pick out the relevant line:
Line 69 in 7781360
My suspicion is that that this IO object (created/obtained e.g. here, I believe:
Since we are using the Glue catalog, I believe this IO object will likely come all the way from GlueTableOperations I am not completely familiar with the internals of Spark here, but it looks to me like this is basically trying to free up memory because it is possibly running up against some limits. As such, I could imagine this would really only happen in very particular cases. For us, this could also explain why we saw this sometimes with Glue 4.0, and now more often with Glue 5.0, because the behavior wrt memory could've changed between versions. |
Ok, I can confirm that commenting out the code: Line 69 in 7781360
allows the job to run to completion. |
Just for documentation, something similar seems to have been discussed here when SerializableTableWithSize was made closeable.: |
After doing some further investigation, my initial conclusion is the following:
I am not sure what a good solution is here, but I suspect that the FileIO may need to be copied when creating the serializable table instead of what is done now:
Would love to get some input here! |
Apache Iceberg version
1.7.1 (latest release)
Query engine
Spark
Please describe the bug 🐞
We are running a maintenance job to rewrite data files (in parallel) on AWS Glue, calling the
rewrite_data_files
procedure like the following:We are getting errors like the following:
A few points:
1.7.x
branch with the recent changes, but the error still remained.suggesting to me the lifecycle of this pool connection is simply not working correctly.
I am happy to try and provide some additional information here and help for a fix, but I'd need some guidance how to do this.
Willingness to contribute
The text was updated successfully, but these errors were encountered: