Skip to content

test: Add ReadLocalNode tests #1794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

test: Add ReadLocalNode tests #1794

wants to merge 5 commits into from

Conversation

TrevorBergeron
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 5, 2025
@TrevorBergeron TrevorBergeron force-pushed the local_node_tests branch 2 times, most recently from 205b9c5 to f43f938 Compare June 5, 2025 19:46
@TrevorBergeron TrevorBergeron marked this pull request as ready for review June 5, 2025 20:11
@TrevorBergeron TrevorBergeron requested review from a team as code owners June 5, 2025 20:11
@TrevorBergeron TrevorBergeron requested a review from tswast June 5, 2025 20:11
tswast
tswast previously approved these changes Jun 6, 2025
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cool!

# used only in testing right now, BigQueryCachingExecutor is the fully featured engine
# simplified, doest not do large >10 gb result queries, error handling, respect global config
# or record metrics
class DirectGbqExecutor(semi_executor.SemiExecutor):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the purpose is to make sure caching and such don't cause any regressions / differences in behavior? Might be good to include that in the comment / docstring if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, mostly just to isolate the simplest, fastest version of bq execution, and avoid slow/complicated stuff only needed at >10gb scale or for stateful interactive flows. added comment in new revision

Comment on lines +39 to +40
# This will error out if polars is not installed
from bigframes.core.compile.polars import PolarsCompiler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a little helper to make sure folks know what package (and possibly versions) to install. See pandas helpers like this:

https://github.com/pandas-dev/pandas/blob/085e18fff3ba9e6b16f4d5fbdea1156c4c6aa195/pandas/compat/_optional.py#L151-L192

Or closer to home, the versions helpers in some of our client libraries:

https://github.com/googleapis/python-bigquery/blob/bd5aba8ba40c2f35fb672a68eed11d6baedb304f/google/cloud/bigquery/_versions_helpers.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, thats a good idea. I am going to expose polars execution in another PR soon, so I'll figure out the messaging experience in that one.

from bigframes.testing import polars_session

session = polars_session.TestSession()
with bigframes.core.global_session._GlobalSessionContext(session):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat! Potentially something we want to consider exposing to users? I guess after someone asks for it so as not to pollute our public surface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if we get some requests, can productionize it. I'm worried its not robust to all the thread-local stuff however?


pytest.importorskip("polars")

REFERENCE_ENGINE = polars_executor.PolarsExecutor()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about this REFERENCE_ENGINE name. Might be worth a comment that the BigQuery engine is the source of truth, but we use this as the reference for faster testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants