-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Release DataFusion 47.0.0
(April 2025)
#15072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@alamb, I'll also be in charge of this release. |
@XiangpengHao also offered to test with the parquet viewer prior to 47: #15102 (comment) |
That's great, added it to release steps |
I feel like this may be important enough to try to get into the release. Does anyone else have thoughts? |
Seems reasonable to me -- I have added it to the "good to get in " list |
Update: wrong ticket (I was looking for 46.0.0 |
The PR #15266 has significantly improved performance, so I added it to the blog section. |
@alamb I think we can start testing the 47.0.0 in the second week of April and begin the release process at the end of that week. What do you think? |
I think it sounds like a great idea -- thank you @xudong963 For your planning purposes I will be away the week of April 21 -- so perhaps we can start testing a week earlier (week of April 7 so we have time to complete / fix issues prior to April 14) |
Happy to test whenever! |
That sounds good! |
Would really appreciate if could add the following PR to the release as well: |
Sure, I added it. Given that there're two approvals, I think it'll be included in DF47, and thanks for your fix |
Hey guys, happy new week, let's start testing the incoming DF47 this week! 🚀 |
Makes sense. Thanks @xudong963 |
We have started testing Comet with the latest DF from main. I added a link to the Comet PR in this PR's description. |
I have tested the Parquet viewer with the latest main and found no problems. But I hit a TPC-H panic when running on LiquidCache: XiangpengHao/liquid-cache#158 I'm working on digging into the root cause.. Other breaking changes I observed:
|
Thank you @XiangpengHao , I added the breaking changes that you mentioned to the summary of the issue. |
I tested the latest DF against our tests - the Substrait consumer is broken when it comes to renaming Struct fields' insides, due to #15239 (comment). I'll try to get a fix up. Edit: fix here #15634 |
Running CI on it now: apache/datafusion-python#1104 |
I spoke too soon - I'm getting one error in our unit tests on |
I think @andygrove filed a ticket for this one I didn't fully follow the discussion -- but it seems like that issue has been closed |
I've read the discussion now and I think I'm in agreement that it's not an actual regression since the aggregation has no deterministic outcome without ordering assigned. |
My only remaining question is if we want to upgrade arrow in this release as well The upgrade PR is here: Since it also upgrades object_store and pyo3 it is somewhat more disruptive. |
+1 for upgrading all the dependencies |
I am also +1 for upgrading the dependencies (for selfish reasons; we are waiting on an arrow feature to help with INT96 timestamps in Parquet) |
Thanks @jayzhan211 for the approval and for the discussion. I'll plan to merge #15466 tomorrow then unless we want to discuss it further. |
Ok, I just merged #15466 / upgrade to dependencies (arrow/object_store/parquet) 47.0.0 I don't know of anything else we are now waiting on for this release. I suggest we make the release notes PR and generate a release candidate BTW I will be offline for about a week starting this Friday April 18 or Saturday so I likely won't be able to help with the release until I return. Hopefully another PMC member can do the final approval / release to crates.io if it isn't ready before I leave. |
|
Closes #1037 ### Change list - Bump `arrow` to `55` and `parquet` to `55` - Temporarily deactivates the `datafusion` integration until datafusion publishes its version `47` (apache/datafusion#15072), so that we can progress with the `arrow` 55 upgrade now. - Update JS and Python APIs for latest `parquet`. - Means we no longer need an initial `HEAD` request for Parquet files before reading metadata.
Thanks for putting this together! If we could additionally get #14412 in, that would be awesome 🙏 |
@alamb It seems there are still one or two PRs that want to be included, so how about making release notes tomorrow(UTC+8, after I get up) |
Sorry @gabotechs -- I just merged that one!
SOunds like a good plan. I'll take a pass through the outstanding PRs again to see if there is anything else we can/should merge Thank you all |
I just merged the version + changelog PR from @xudong963 I also created a @xudong963, given I think it is late in your timezone, I'll plan to make an RC in a few hours unless you let me know otherwise. |
I have made a release candidate and started a voting thread: https://lists.apache.org/thread/zrq9x9gf51r8b6m9qokf2q75kh251rm6 |
Note that I will be away starting April 18, and so likely can not complete the vote / release process until April 26. @andygrove would it be possible for you to complete the voting process / publish the release to crates.io for this release? |
Yes, I would be happy to do that. I will be offline on Saturday, but can take care of it on Sunday or Monday. |
Awesome -- we have a plan! I hope to work on the upgrade guide and maybe even a blog post about this release, but we'll see if I have the time |
Here is a draft upgrade guide: |
I filed the following ticket for the next release: |
Is your feature request related to a problem or challenge?
Tracking ticket for next release, also a place to track desired inclusions
Previous release will be https://crates.io/crates/datafusion/45.0.0 (likely Feb 1, 2025) December 31, 2024 so next major release would be around March 1, 2025
Steps:
48.0.0
(June 2025) #15771Prior release tickets:
45.0.0
: Release DataFusion45.0.0
#1400846.0.0
: Release DataFusion46.0.0
#14123Changes to add to upgrade guide
These PRs made changes that deserve a mention in the upgrade guide
Int64
vsUInt64
, etc) #15341FileGroup
structure forVec<PartitionedFile>
#15379downcast_to_source
method forDataSourceExec
#15416version <= 40
#15027Features to mention in the blog (if they make it)
SQL EXPLAIN
Tree Rendering #14914VARCHAR
fromUtf8
toUtf8View
#15096JoinSetTracer
trait for tracing context propagation in spawned tasks #14547first_value
by implementing specialGroupsAccumulator
#15266Bugs that would be good to fix
panic
when evaluating trivial WHERE with a CTE #15386PartitionedFile
andFileGroup
statistics should be inexact/recomputed #15539last_value
functionality #15676Community Wishlist
Date32
to string given timestamp specifiers #15361date
totimestamp
with tz #14638statistics_by_partition
API toExecutionPlan
#15495@alamb 's wishlist
SQL EXPLAIN
Tree Rendering #14914tree
explain by default #15343The text was updated successfully, but these errors were encountered: