Skip to content

Release DataFusion 47.0.0 (April 2025) #15072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
30 of 39 tasks
Tracked by #3016
alamb opened this issue Mar 7, 2025 · 58 comments
Open
30 of 39 tasks
Tracked by #3016

Release DataFusion 47.0.0 (April 2025) #15072

alamb opened this issue Mar 7, 2025 · 58 comments
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Mar 7, 2025

Is your feature request related to a problem or challenge?

Tracking ticket for next release, also a place to track desired inclusions

Previous release will be https://crates.io/crates/datafusion/45.0.0 (likely Feb 1, 2025) December 31, 2024 so next major release would be around March 1, 2025

Steps:

Prior release tickets:

Changes to add to upgrade guide

These PRs made changes that deserve a mention in the upgrade guide

Features to mention in the blog (if they make it)

Bugs that would be good to fix

Community Wishlist

@alamb 's wishlist

@alamb alamb added the enhancement New feature or request label Mar 7, 2025
@alamb alamb mentioned this issue Mar 7, 2025
26 tasks
@xudong963
Copy link
Member

@alamb, I'll also be in charge of this release.

@alamb
Copy link
Contributor Author

alamb commented Mar 11, 2025

@XiangpengHao also offered to test with the parquet viewer prior to 47: #15102 (comment)

@xudong963
Copy link
Member

@XiangpengHao also offered to test with the parquet viewer prior to 47: #15102 (comment)

That's great, added it to release steps

@shehabgamin
Copy link
Contributor

shehabgamin commented Mar 19, 2025

I feel like this may be important enough to try to get into the release. Does anyone else have thoughts?

@alamb
Copy link
Contributor Author

alamb commented Mar 19, 2025

I feel like this may be important enough to try to get into the release. Does anyone else have thoughts?

Seems reasonable to me -- I have added it to the "good to get in " list

@alamb
Copy link
Contributor Author

alamb commented Mar 19, 2025

I believe the voting deadline has passed and I will now will prmote / publish it.

Update: wrong ticket (I was looking for 46.0.0

@xudong963
Copy link
Member

The PR #15266 has significantly improved performance, so I added it to the blog section.

@xudong963
Copy link
Member

@alamb I think we can start testing the 47.0.0 in the second week of April and begin the release process at the end of that week. What do you think?

@alamb
Copy link
Contributor Author

alamb commented Mar 28, 2025

@alamb I think we can start testing the 47.0.0 in the second week of April and begin the release process at the end of that week. What do you think?

I think it sounds like a great idea -- thank you @xudong963

For your planning purposes I will be away the week of April 21 -- so perhaps we can start testing a week earlier (week of April 7 so we have time to complete / fix issues prior to April 14)

@shehabgamin
Copy link
Contributor

Happy to test whenever!

@xudong963
Copy link
Member

For your planning purposes I will be away the week of April 21 -- so perhaps we can start testing a week earlier (week of April 7 so we have time to complete / fix issues prior to April 14)

That sounds good!

@rluvaton
Copy link
Contributor

rluvaton commented Apr 2, 2025

Would really appreciate if could add the following PR to the release as well:

@xudong963
Copy link
Member

Would really appreciate if could add the following PR to the release as well:

Sure, I added it. Given that there're two approvals, I think it'll be included in DF47, and thanks for your fix

@xudong963
Copy link
Member

Hey guys, happy new week, let's start testing the incoming DF47 this week! 🚀

@alamb
Copy link
Contributor Author

alamb commented Apr 7, 2025

Makes sense. Thanks @xudong963

@andygrove
Copy link
Member

andygrove commented Apr 7, 2025

We have started testing Comet with the latest DF from main. I added a link to the Comet PR in this PR's description.

apache/datafusion-comet#1563

@XiangpengHao
Copy link
Contributor

I have tested the Parquet viewer with the latest main and found no problems.

But I hit a TPC-H panic when running on LiquidCache: XiangpengHao/liquid-cache#158

I'm working on digging into the root cause..

Other breaking changes I observed:

  1. Need to handle DisplayFormatType::TreeRender for execution plans.
  2. map_partial_batch is removed from schema mapper
  3. page_pruning_predicate is removed from public api

@xudong963
Copy link
Member

Thank you @XiangpengHao , I added the breaking changes that you mentioned to the summary of the issue.

@Blizzara
Copy link
Contributor

Blizzara commented Apr 8, 2025

I tested the latest DF against our tests - the Substrait consumer is broken when it comes to renaming Struct fields' insides, due to #15239 (comment). I'll try to get a fix up.

Edit: fix here #15634

@timsaucer
Copy link
Contributor

Running CI on it now: apache/datafusion-python#1104

@timsaucer
Copy link
Contributor

I spoke too soon - I'm getting one error in our unit tests on last_value. I'm trying to investigate this morning.

@timsaucer
Copy link
Contributor

I did a little investigating, but I don't have time for a couple of days to dive in deeper. This appears to be related to #15542 @UBarney do you know why the sql unit tests needed to be changed to pass? It seems like we have a regression in datafusion-python related to this change.

@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2025

I did a little investigating, but I don't have time for a couple of days to dive in deeper. This appears to be related to #15542 @UBarney do you know why the sql unit tests needed to be changed to pass? It seems like we have a regression in datafusion-python related to this change.

I think @andygrove filed a ticket for this one

I didn't fully follow the discussion -- but it seems like that issue has been closed

@timsaucer
Copy link
Contributor

I did a little investigating, but I don't have time for a couple of days to dive in deeper. This appears to be related to #15542 @UBarney do you know why the sql unit tests needed to be changed to pass? It seems like we have a regression in datafusion-python related to this change.

I think @andygrove filed a ticket for this one

* [Regression in `last_value` functionality #15676](https://github.com/apache/datafusion/issues/15676)

I didn't fully follow the discussion -- but it seems like that issue has been closed

I've read the discussion now and I think I'm in agreement that it's not an actual regression since the aggregation has no deterministic outcome without ordering assigned.

@alamb
Copy link
Contributor Author

alamb commented Apr 12, 2025

Do we want to hold DF 47 release for the arrow upgrade too?
I think it is possible (arrow will hopefully be released at the end of this week -- and we could make the DF release candidate next week...)

This would be great for the Comet project.

My only remaining question is if we want to upgrade arrow in this release as well

The upgrade PR is here:

Since it also upgrades object_store and pyo3 it is somewhat more disruptive.

@jayzhan211
Copy link
Contributor

My only remaining question is if we want to upgrade arrow in this release as well

+1 for upgrading all the dependencies

@andygrove
Copy link
Member

I am also +1 for upgrading the dependencies (for selfish reasons; we are waiting on an arrow feature to help with INT96 timestamps in Parquet)

@alamb
Copy link
Contributor Author

alamb commented Apr 13, 2025

Thanks @jayzhan211 for the approval and for the discussion. I'll plan to merge #15466 tomorrow then unless we want to discuss it further.

@alamb
Copy link
Contributor Author

alamb commented Apr 14, 2025

Ok, I just merged #15466 / upgrade to dependencies (arrow/object_store/parquet) 47.0.0

I don't know of anything else we are now waiting on for this release. I suggest we make the release notes PR and generate a release candidate

BTW I will be offline for about a week starting this Friday April 18 or Saturday so I likely won't be able to help with the release until I return. Hopefully another PMC member can do the final approval / release to crates.io if it isn't ready before I leave.

@alamb
Copy link
Contributor Author

alamb commented Apr 14, 2025

kylebarron added a commit to geoarrow/geoarrow-rs that referenced this issue Apr 14, 2025
Closes #1037


### Change list

- Bump `arrow` to `55` and `parquet` to `55`
- Temporarily deactivates the `datafusion` integration until datafusion
publishes its version `47`
(apache/datafusion#15072), so that we can
progress with the `arrow` 55 upgrade now.
- Update JS and Python APIs for latest `parquet`. 
- Means we no longer need an initial `HEAD` request for Parquet files
before reading metadata.
@andygrove
Copy link
Member

@alamb I am hoping that we can merge #15537 for this release. It was just rebased now that the arrow-rs upgrade is merged.

@gabotechs
Copy link
Contributor

Thanks for putting this together! If we could additionally get #14412 in, that would be awesome 🙏

@xudong963
Copy link
Member

xudong963 commented Apr 15, 2025

I don't know of anything else we are now waiting on for this release. I suggest we make the release notes PR and generate a release candidate

@alamb It seems there are still one or two PRs that want to be included, so how about making release notes tomorrow(UTC+8, after I get up)

@alamb
Copy link
Contributor Author

alamb commented Apr 15, 2025

@alamb I am hoping that we can merge #15537 for this release. It was just rebased now that the arrow-rs upgrade is merged.

Thanks for putting this together! If we could additionally get #14412 in, that would be awesome 🙏

Sorry @gabotechs -- I just merged that one!

@alamb It seems there are still one or two PRs that want to be included, so how about making release notes tomorrow(UTC+8, after I get up)

SOunds like a good plan. I'll take a pass through the outstanding PRs again to see if there is anything else we can/should merge

Thank you all

@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2025

I just merged the version + changelog PR from @xudong963

I also created a branch-47 here for the release:

@xudong963, given I think it is late in your timezone, I'll plan to make an RC in a few hours unless you let me know otherwise.

@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2025

I have made a release candidate and started a voting thread: https://lists.apache.org/thread/zrq9x9gf51r8b6m9qokf2q75kh251rm6

@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2025

Note that I will be away starting April 18, and so likely can not complete the vote / release process until April 26. @andygrove would it be possible for you to complete the voting process / publish the release to crates.io for this release?

@andygrove
Copy link
Member

Note that I will be away starting April 18, and so likely can not complete the vote / release process until April 26. @andygrove would it be possible for you to complete the voting process / publish the release to crates.io for this release?

Yes, I would be happy to do that. I will be offline on Saturday, but can take care of it on Sunday or Monday.

@alamb
Copy link
Contributor Author

alamb commented Apr 16, 2025

Awesome -- we have a plan! I hope to work on the upgrade guide and maybe even a blog post about this release, but we'll see if I have the time

@alamb
Copy link
Contributor Author

alamb commented Apr 17, 2025

Here is a draft upgrade guide:

@alamb
Copy link
Contributor Author

alamb commented Apr 19, 2025

I filed the following ticket for the next release:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests