Replies: 2 comments
-
First of all, i really like having an articulated policy for this 👍. One major question for we would be, will we - in a kernelized world - still try to maintain a version that does not need datafusion? If so, aligning our versioning with datafusion might be problematic as it becomes somewhat meaningless... Then again the difference in features between what we can do without datafusion and what kernel can do with the default engine might not be significant. So I would be open to always requiring datafusion in 1.0. Just a thought dump though 🙂 |
Beta Was this translation helpful? Give feedback.
-
As most a consumer, it would not bother me either way. We pull in multiple arrow and datafusion crates, and update the triad of dependencies when deltalake releases. For simpler use cases +1 to pushing for using the re-exported symbols. As for minor version being datafusion version. The trade off is tying your minor level releases to also a df upgrade. So it's only really semver when datafusion upgrades. If folks are pushed to use the re-exported symbols then it doesn't really matter to them. Maybe for the advanced users some version matrix is sufficient? (that is easier at-a-glance than parsing the Cargo.toml 😄) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I was thinking about how we might handle semantic versioning with a really-soon-now-I-promise
deltalake
1.0 Rust crate release.As you may know, we have fairly rapid major version dependencies changing underneath us from both
datafusion
andarrow
crates. Technicallyobject_store
can, and has recently had some "major" (0.x) API changes. I personally don't want run like a hamster on a release treadmill where arrow releases a major version change, and the ndatafusion releases a major version change, and then delta-rs needs a major version change.The theory behind this treadmill is that since we expose
RecordBatch
and other datafusion/arrow symbols in our APIs, it would be semantically breaking for us not to rev major versions when those dependencies change.I believe
deltalake
1.0 should not have major version changes due to upgrades of arrow or datafusion.I have written before about our policy on re-exporting some symbols which are quite helpful for downstream users of the crate(s). I believe this pattern of symbol re-export can help insulate our users from major semver churn due to underlying changes in arrow and datafusion. Meaning we should strongly encourage users to use our re-exported symbols which will always be the "right"
RecordBatch
, for example.The only potential hiccup is when a downstream consumer is pulling arrow in directly because they are negotiating the dependency graphs between
deltalake
and another crate which requires arrow. For those users, I think we can commit to clearly documenting the minor versions where we would see these changes, and encourage them to pin their dependencies to1.0.x
or1.1.x
If we wanted to go that extra mile, I might suggest that our minor versions track datafusion, and then we release under patch releases, i.e.:
1.47.0
: first release with df 471.47.1
: patch release with our changes1.47.2
: etc1.48.0
: first release with df 482.49.0
: major breaking API changes indeltalake
(and df 49(and so on.
I'm curious what everybody thinks, I'm hoping not to 🏃♂️ too much after our upstream dependencies 😄
Beta Was this translation helpful? Give feedback.
All reactions