Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datasets): Created table_args to pass to create_table, create_view, and table methods #909

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
9 changes: 5 additions & 4 deletions kedro-datasets/RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,17 @@

## Major features and improvements

- Added functionality to save Pandas DataFrame directly to Snowflake, facilitating seemless `.csv` ingestion
- Added Python 3.9, 3.10 and 3.11 support for SnowflakeTableDataset
- Added functionality to save Pandas DataFrame directly to Snowflake, facilitating seemless `.csv` ingestion.
- Added Python 3.9, 3.10 and 3.11 support for SnowflakeTableDataset.
- Changed `ibis.TableDataset` to support passing arguments to `create_table`, `create_view`, and `table` via a `table_args` parameter which allows users to set catalog/database in a data cataog entry.
- Added the following new **experimental** datasets:

| Type | Description | Location |
| --------------------------------- | ------------------------------------------------------ | ---------------------------------------- |
| `databricks.ExternalTableDataset` | A dataset for accessing external tables in Databricks. | `kedro_datasets_experimental.databricks` |


## Bug fixes and other changes
- Implemented Snowflake's (local testing framework)[https://docs.snowflake.com/en/developer-guide/snowpark/python/testing-locally] for testing purposes
- Implemented Snowflake's (local testing framework)[https://docs.snowflake.com/en/developer-guide/snowpark/python/testing-locally] for testing purposes.

## Breaking Changes
- Demoted `video.VideoDataset` from core to experimental dataset.
Expand All @@ -23,6 +23,7 @@ Many thanks to the following Kedroids for contributing PRs to this release:

- [Thomas d'Hooghe](https://github.com/tdhooghe)
- [Minura Punchihewa](https://github.com/MinuraPunchihewa)
- [Mark Druffel](https://github.com/mark-druffel)

# Release 5.1.0

Expand Down
13 changes: 12 additions & 1 deletion kedro-datasets/kedro_datasets/ibis/table_dataset.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Provide data loading and saving functionality for Ibis's backends."""

from __future__ import annotations

import warnings
Expand Down Expand Up @@ -69,6 +70,7 @@ class TableDataset(AbstractDataset[ir.Table, ir.Table]):
"materialized": "view",
"overwrite": True,
}
DEFAULT_TABLE_ARGS: ClassVar[dict[str, Any]] = {}

_connections: ClassVar[dict[tuple[tuple[str, str]], BaseBackend]] = {}

Expand All @@ -79,6 +81,7 @@ def __init__( # noqa: PLR0913
file_format: str | None = None,
table_name: str | None = None,
connection: dict[str, Any] | None = None,
table_args: dict[str, Any] | None = None,
load_args: dict[str, Any] | None = None,
save_args: dict[str, Any] | None = None,
metadata: dict[str, Any] | None = None,
Expand All @@ -104,6 +107,8 @@ def __init__( # noqa: PLR0913
table_name: The name of the table or view to read or create.
connection: Configuration for connecting to an Ibis backend.
If not provided, connect to DuckDB in in-memory mode.
table_args: Additional arguments passed to the Ibis backend's
`create_{materialized}` method and `table` method.
load_args: Additional arguments passed to the Ibis backend's
`read_{file_format}` method.
save_args: Additional arguments passed to the Ibis backend's
Expand Down Expand Up @@ -141,8 +146,13 @@ def __init__( # noqa: PLR0913

self._save_args = deepcopy(self.DEFAULT_SAVE_ARGS)
if save_args is not None:
if table_args is not None:
save_args["database"] = table_args.get("database", None)
Comment on lines +149 to +150
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit magical to me. It's not really consistent with the docstring, either, which says that arguments will be passed to create_{materialized}; in reality, the user needs to know that just database will be passed.

I personally would recommend one of two approaches. One is to not do anything special here; the user can pass database in save_args and database in table_args, and, while it may feel duplicative, at least it's explicit. The other approach to make an explicit database keyword for the dataset, and likely raise an error if database is specified in save_args and/or table_args if also passed explicitly.

@mark-druffel does this make sense, and do you have a preference?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deepyaman As discussed yesterday, I've moved database to the top-level as discussed. I'm trying to push the changes, but I'm getting blocked by pre-commit now that I have it setup properly.

When it ran, it changed a bunch of files I never touched. I staged those as well (not sure if I should've), but my commit still failed because of Black. I've run black manually on the file I changed too to try to lint the file. Any suggestions how I can get this working properly? 😬

image

self._save_args.update(save_args)

self._table_args = deepcopy(self.DEFAULT_TABLE_ARGS)
if table_args is not None:
self._table_args.update(table_args)
self._materialized = self._save_args.pop("materialized")

@property
Expand Down Expand Up @@ -176,7 +186,7 @@ def load(self) -> ir.Table:
reader = getattr(self.connection, f"read_{self._file_format}")
return reader(self._filepath, self._table_name, **self._load_args)
else:
return self.connection.table(self._table_name)
return self.connection.table(self._table_name, **self._table_args)

def save(self, data: ir.Table) -> None:
if self._table_name is None:
Expand All @@ -191,6 +201,7 @@ def _describe(self) -> dict[str, Any]:
"file_format": self._file_format,
"table_name": self._table_name,
"backend": self._connection_config["backend"],
"table_args": self._table_args,
"load_args": self._load_args,
"save_args": self._save_args,
"materialized": self._materialized,
Expand Down
Loading