Releases: Eventual-Inc/Daft
Releases · Eventual-Inc/Daft
v0.5.5
What's Changed 🚀
💥 Breaking Changes
- chore!: upgrade AWS sdk and remove vendored OpenSSL @kevinzwang (#4508)
✨ Features
- feat: add new "dashboard" metrics subscriber @universalmind303 (#4527)
- feat: add some code style check cmd @Jay-ju (#4505)
- feat: HTTPConfig settings for timeouts and retries @srilman (#4537)
- feat(flotilla): Add cpu profiling / tracing @colin-ho (#4542)
- feat: Url parse function @jordandelbar (#4533)
🐛 Bug Fixes
- fix: Resolve mismatch with Thrift compact protocol @desmondcheongzx (#4545)
- fix: Enable chrome trace to flush on native executor cancel @colin-ho (#4534)
📖 Documentation
- docs: Fix docstring for DataFrame.sort() @desmondcheongzx (#4551)
🔧 Maintenance
- chore: add @publicapi decorators to all public expr methods @universalmind303 (#4473)
- chore!: upgrade AWS sdk and remove vendored OpenSSL @kevinzwang (#4508)
- chore(flotilla): PipelineNode
.start
acceptArc<Self>
to simplify cloning @colin-ho (#4543) - chore: change unity volume prefix to "vol+dbfs:/" @kevinzwang (#4519)
- chore: Vendor parquet-format-safe dependency @desmondcheongzx (#4544)
- chore: Refactor storage backends for native writers @desmondcheongzx (#4536)
Full Changelog: v0.5.4...v0.5.5
v0.5.4
What's Changed 🚀
✨ Features
- feat(flotilla): Use max sources config to determine files per partition @colin-ho (#4535)
- feat(flotilla): Plan explain @colin-ho (#4517)
- feat: Add a native remote parquet writer @desmondcheongzx (#4493)
- feat(flotilla): Actor pool project @colin-ho (#4497)
- feat: add clean for target files @Jay-ju (#4507)
- feat(connect): add support for spark when/otherwise @universalmind303 (#4524)
- feat(sql): support count(1) @universalmind303 (#4512)
- feat: regexp replace posix groups and single regex speedup @SparkApplicationMaster (#4504)
- feat: make min_cpu of ray runner configurable @Jay-ju (#4506)
- feat: adds .serialize('json') to expressions @rchowell (#4516)
- feat:
f""
string style string concatenation @Farzan-Hashmi (#4500) - feat: add id and plan_id to otel subscribers, and add method for getting tree hierarchy for a pipeline @universalmind303 (#4485)
- feat: adds .jq expression with complex use-case examples @rchowell (#4470)
🐛 Bug Fixes
- fix: Fix chrome trace @colin-ho (#4531)
- fix: Use PYTHON_VERSION instead of python3 in Makefile @petern48 (#4499)
🚀 Performance
- perf: add harness for benchmarking parquet writes @desmondcheongzx (#4523)
📖 Documentation
- docs: reorganize integrations @ccmao1130 (#4488)
- docs: fix bad markdown links @rchowell (#4528)
👷 CI
- ci: Add a dbg check in precommit @jordandelbar (#4509)
- ci: Fix style checks @colin-ho (#4511)
🔧 Maintenance
- chore: add additional context to the runtime stats events @universalmind303 (#4520)
- chore: add better messaging for missing spark functions @universalmind303 (#4521)
- chore: update dashboard backend and disable query viz temporarily @universalmind303 (#4475)
- chore: add test case over #4180 @flaneur2020 (#4484)
Full Changelog: v0.5.3...v0.5.4
v0.5.3
What's Changed 🚀
✨ Features
- feat: url download for unity catalog volumes @kevinzwang (#4476)
- feat(flotilla): Flotilla runner @colin-ho (#4459)
♻️ Refactor
- refactor(exprs): move uri functions into own crate @universalmind303 (#4455)
- refactor(exprs): refactor tokenize, coalesce, struct @universalmind303 (#4454)
- refactor(exprs): refactor hash,minhash, and cosine_distance @universalmind303 (#4452)
🔧 Maintenance
- chore: rename block_on method to convey intent @rohitkulshreshtha (#4474)
Full Changelog: v0.5.2...v0.5.3
v0.5.2
What's Changed 🚀
🚀 Performance
♻️ Refactor
- refactor(exprs): temporals @universalmind303 (#4446)
- refactor(exprs): move over binary functions @universalmind303 (#4398)
📖 Documentation
- docs: Dynamic execution docs @colin-ho (#4471)
- docs: cleanup part 2 @ccmao1130 (#4468)
👷 CI
- ci: Update base url in broken link checker @colin-ho (#4469)
- ci: remove py runner from nightly test matrix and pin aiohttp @kevinzwang (#4466)
Full Changelog: v0.5.1...v0.5.2
v0.5.1
What's Changed 🚀
📖 Documentation
- docs: cleanup docs @ccmao1130 (#4434)
- docs: Update sessions.md to reflect correct import behavior as described in session class docstring @everettVT (#4381)
Full Changelog: v0.5.0...v0.5.1
v0.5.0
‼️ v0.4 -> v0.5 Migration Guide ‼️
General
- The getdaft Python package was deprecated in v0.4 and is now unsupported. Please import the daft package instead
- The PyRunner was deprecated in v0.4 and is now removed. Please use the native runner instead, which is now the default runner for local execution:
- Change the
DAFT_RUNNER
env var frompy
tonative
- Use
daft.context.set_runner_native()
instead ofdaft.context.set_runner_py()
- Change the
Python API
-
daft.Series
- Passing a Series into
daft.lit
ordaft.Expression
methods is no longer supported.
- Passing a Series into
-
daft.sql.SQLCatalog
- This class is marked as deprecated and will be removed in v0.6. You can now use keyword arguments with
daft.sql(...)
to add DataFrames to the query:
# before catalog = daft.sql.SQLCatalog({ "my_df": df }) daft.sql("SELECT * FROM my_df", catalog=catalog) # after daft.sql("SELECT * FROM my_df", my_df=df)
- This class is marked as deprecated and will be removed in v0.6. You can now use keyword arguments with
-
These functions in
daft.catalog
were deprecated in v0.4 and have now been removed.daft.catalog.read_table
- usedaft.read_table
insteaddaft.catalog.register_table
- usedaft.attach_table
insteaddaft.catalog.register_python_catalog
- usedaft.attach_catalog
insteaddaft.catalog.unregister_catalog
- usedaft.detach_catalog
instead
-
These functions in
daft.io.catalog
are deprecated and marked for removal in v0.6. Please use our unified catalog API instead.daft.io.catalog.DataCatalog
daft.io.catalog.DataCatalogTable
What's Changed 🚀
✨ Features
- feat(catalogs): Rust-based in-memory catalog and table @kevinzwang (#4445)
- feat(flotilla): Materialize pipeline @colin-ho (#4457)
- feat: adds python partition fields to the DataSource API @rchowell (#4449)
- feat: adds deserialize and try_deserialize with support for json @rchowell (#4438)
- feat(flotilla): LogicalPlan to PipelineNode translation @srilman (#4442)
- feat: Flotilla scheduler and dispatcher actors @colin-ho (#4375)
- feat: enables scalar function lowering e.g. function overloads @rchowell (#4431)
- feat(flotilla): Additional PipelineNodes for map pipelines @srilman (#4439)
- feat: Flotilla default scheduler @colin-ho (#4376)
🐛 Bug Fixes
♻️ Refactor
- refactor(ordinals): move binding step above micropartition + local execution @kevinzwang (#4425)
👷 CI
- ci: remove publish with name 'getdaft' @kevinzwang (#4456)
🔧 Maintenance
- chore: Remove the PyRunner @srilman (#4458)
- chore: adds CTE map to daft.sql and removes deprecated catalog APIs @rchowell (#4460)
Full Changelog: v0.4.18...v0.5.0
v0.4.18
What's Changed 🚀
🐛 Bug Fixes
- fix: Implement dedicated map growable @colin-ho (#4435)
- fix: update broken-link-checker.yml @ccmao1130 (#4440)
- fix: casting from list(list(T)) to list(tensor(T, shape)) @universalmind303 (#4437)
- fix: Ensure additional columns are passed through granular projection splitting @srilman (#4423)
- fix: Improve UDF errors @colin-ho (#4424)
🔧 Maintenance
Full Changelog: v0.4.17...v0.4.18
v0.4.17
What's Changed 🚀
- build: make dashboard self contained again, except when running in ci @universalmind303 (#4417)
✨ Features
- feat(catalogs): enable Rust usage of Python catalogs and tables @kevinzwang (#4394)
- feat: adds functions prelude and clean up contributing notes. @rchowell (#4416)
- feat: Add duration expressions @rohitkulshreshtha (#4391)
🐛 Bug Fixes
- fix(dashboard): broadcast url @universalmind303 (#4404)
- fix: substr with null args @universalmind303 (#4415)
- fix: Enable strict mode for mypy in pre-commit @colin-ho (#4422)
- fix: Fix iceberg docker compose entrypoint command @colin-ho (#4428)
- fix: Progress bar no longer panics @desmondcheongzx (#4421)
- fix(spark-connect): withColumnRenamed now preserves non-renamed columns @jwills (#4418)
- fix(mypy): mypy fixes for daft.runners @srilman (#4392)
- fix: Add s3n to parse_url @desmondcheongzx (#4405)
- fix(tests): add retry to pyarrow parquet read @kevinzwang (#4408)
- fix(ci): import matplotlib for window function tutorial @kevinzwang (#4407)
♻️ Refactor
- refactor(exprs): literal support for FunctionArgs proc macro @kevinzwang (#4401)
📖 Documentation
- docs: Fix docs generation for
Expression.embedding.cosine_distance
@srilman (#4419) - docs: fix readthedocs version dropdown @ccmao1130 (#4406)
👷 CI
- ci: Add retries to requirements installation @colin-ho (#4430)
- ci: Up broken link checker retries @colin-ho (#4429)
- ci: Cancel ongoing PR tests on push @desmondcheongzx (#4410)
- ci: Remove
py
runner tests from PR CI @srilman (#4403)
Full Changelog: v0.4.16...v0.4.17
v0.4.16
What's Changed 🚀
✨ Features
- feat: Add put_multipart to s3_like @rohitkulshreshtha (#4360)
- feat: adds partitioning classes for python @rchowell (#4366)
- feat: Flotilla scheduler @colin-ho (#4349)
- feat: Add a native local parquet writer @desmondcheongzx (#4260)
- feat: Flotilla plan result @colin-ho (#4275)
- feat: Add optional spark dependency for pyspark connector @desmondcheongzx (#4368)
- feat: add a
repr_json
for logical plans. @universalmind303 (#4354) - feat: Flotilla utils @colin-ho (#4345)
🐛 Bug Fixes
- fix: More mypy fixes @colin-ho (#4388)
- fix: skip flaky actor pool GPU test @kevinzwang (#4397)
- fix: Remove botocore dependency when working with deltalake @desmondcheongzx (#4369)
- fix: mypy catalog fixes @rchowell (#4384)
- fix(mypy): part of the mypy strict mode errors @kevinzwang (#4386)
- fix: Add Rust Testing Import @srilman (#4382)
- fix: adds HTTP and HF retry logic based upon exist GCS retry logic @rchowell (#4371)
- fix: Remove dbg! in url download @colin-ho (#4374)
- fix: ignores tests/integration for make test target @rchowell (#4365)
- fix: mirror raw.githubusercontent via S3 to avoid CI throttling @rchowell (#4367)
- fix: CSV read with disjoint predicate pushdown @kevinzwang (#4363)
🚀 Performance
- perf: Split projections with expressions that need granular batching @srilman (#4329)
- perf: Use url_download max_connections for projection batch size @srilman (#4328)
- perf: Morsel size ranges for project and filter operators @srilman (#4344)
♻️ Refactor
- refactor(exprs): move all
list
exprs to new expr @universalmind303 (#4340) - refactor(exprs): 6 of 6 move all utf8 exprs to own crate @universalmind303 (#4312)
- refactor(exprs): proc macro for parsing arguments into structs @kevinzwang (#4348)
- refactor(exprs): remove unused sql json module @universalmind303 (#4364)
- refactor(exprs): url upload/download to new pattern @universalmind303 (#4352)
- refactor(ordinals): bound expressions in table statistics @kevinzwang (#4342)
📖 Documentation
- docs: add roadmap @ccmao1130 (#4377)
- docs: add readthedocs version selector for future versions @ccmao1130 (#4383)
- docs: rework window functions demo with added context @ccmao1130 (#4355)
- docs: rearrange integrations section into I/O @ccmao1130 (#4358)
🔧 Maintenance
- chore: add repr json to dashboard broadcast & make url settable @universalmind303 (#4395)
- chore: Fix some strict mypy errors @desmondcheongzx (#4385)
- chore(exprs): add helper methods for extracting scalars from function args @universalmind303 (#4351)
- chore: cleanup pushdowns @rchowell (#4343)
Full Changelog: v0.4.15...v0.4.16
v0.4.15
What's Changed 🚀
✨ Features
- feat: Add
Expr.skew
@srilman (#4346) - feat: Add generic interface for custom data sinks @desmondcheongzx (#4244)
- feat: adds user-defined dataframe source apis @rchowell (#4254)
- feat: adds expression visitor @rchowell (#4278)
- feat(window): order by-only ranking @f4t4nt (#4336)
- feat(window): order by with no partition by row number impl @f4t4nt (#4324)
- feat(window): range between for partition-by windows @f4t4nt (#4235)
- feat(window): tpc-ds queries @f4t4nt (#4283)
- feat: add support for OTEL metrics and tracing @ohbh (#4322)
- feat(window): fix window copy issue @f4t4nt (#4316)
- feat: Make file writers async @desmondcheongzx (#4320)
🐛 Bug Fixes
🚀 Performance
- perf: TopN Operator and Optimization @srilman (#4307)
- perf: Perform Local Distinct list_agg in count_distinct aggregations @srilman (#4325)
♻️ Refactor
- refactor: 5 of N expression refactor (json functions) @universalmind303 (#4302)
- refactor: 4 of N move all image exprs to new way of writing expressions @universalmind303 (#4294)
- refactor(ordinals): use indices in RecordBatch::get_column(s) @kevinzwang (#4318)
- refactor: 3 of N expr refactor @universalmind303 (#4286)
📖 Documentation
- docs: fix readthedocs timeouts @rchowell (#4350)
- docs: add algolia search, separate primary & secondary nav, stylistic changes @ccmao1130 (#4330)
- docs(window): window function tutorial / demo @f4t4nt (#4331)
🔧 Maintenance
- chore: Add daft-dashboard frontend build step to Makefile @srilman (#4337)
- chore: updates mypy to 1.15.0 @rchowell (#4334)
- chore: removes requirements-docs.txt @rchowell (#4335)
Full Changelog: v0.4.14...v0.4.15