Skip to content

Add INSERT INTO and DELETE FROM write methods#682

Merged
mgdenno merged 5 commits intov0.6-devfrom
copilot/add-insert-and-delete-methods
Mar 15, 2026
Merged

Add INSERT INTO and DELETE FROM write methods#682
mgdenno merged 5 commits intov0.6-devfrom
copilot/add-insert-and-delete-methods

Conversation

Copy link
Contributor

Copilot AI commented Mar 13, 2026

The Write class only supported MERGE INTO-based writes, lacking a fast bulk insert path and any row deletion capability.

Changes

New insert write mode in to_warehouse()

Uses INSERT INTO ... SELECT * FROM ... — no duplicate checking, faster than append (which uses MERGE INTO). Caller is responsible for ensuring no unwanted duplicates.

ev.write.to_warehouse(
    source_data=df,
    table_name="primary_timeseries",
    write_mode="insert",
)

New delete_from() method on the Write class

Executes DELETE FROM with optional filter conditions. Supports SQL strings, dicts, and TableFilter objects.

  • dry_run=True — returns a Spark DataFrame of matching rows without deleting
  • dry_run=False (default) — deletes and returns row count
# Preview what would be deleted
sdf = ev.write.delete_from(
    table_name="primary_timeseries",
    filters=["location_id = 'usgs-01234567'"],
    dry_run=True,
)
print(f"Would delete {sdf.count()} rows")

# Execute deletion
count = ev.write.delete_from(
    table_name="primary_timeseries",
    filters={"column": "location_id", "operator": "=", "value": "usgs-01234567"},
)
print(f"Deleted {count} rows")

New delete() method on BaseTable

Enables the ev.table().delete() and ev.primary_timeseries.delete() patterns as a convenient alternative to ev.write.delete_from(). Supports the same filter formats and dry_run flag.

# Via named table property
count = ev.primary_timeseries.delete(
    filters=["location_id = 'usgs-01234567'"],
)

# Via ev.table()
sdf = ev.table("primary_timeseries").delete(
    filters=["location_id = 'usgs-01234567'"],
    dry_run=True,
)

# Delete all rows
count = ev.primary_timeseries.delete()

TableWriteEnum

Added insert value.

Documentation

  • New Deleting Data section in tables.rst documenting both ev.table().delete() and ev.write.delete_from() with dry-run and filter examples
  • Expanded Writing Results section covering all write modes including insert
  • Updated write_mode parameter description in Loading Parameters to list insert

Merged from v0.6-dev

Includes BaseTable.drop(), BaseTable.is_core_table, and BaseEvaluation.drop_table() from the v0.6-dev branch.

Original prompt

This section details on the original issue you should resolve

<issue_title>Add INSERT and DELETE to write methods</issue_title>
<issue_description>- Insert (INSERT INTO) should be a separate method that appends without checking for duplicates (different from the current "append which is MERGE INTO) and would therefore be faster

  • Maybe we should rename append to something more specific so users know no duplicates are possible
  • Delete (DELETE FROM) could take filters and have a "dry run" flag
    • dry-run could have a flag to return a sdf so the user can see what would be deleted, or just tell the user the number pf rows that would/were deleted.</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI changed the title [WIP] Add insert and delete methods to write operations Add INSERT INTO and DELETE FROM write methods Mar 13, 2026
Copilot AI requested a review from mgdenno March 13, 2026 22:17
@mgdenno
Copy link
Contributor

mgdenno commented Mar 13, 2026

@copilot merge v0.6-dev into this branch and update docs and add tests.

Copy link
Contributor Author

Copilot AI commented Mar 13, 2026

@copilot merge v0.6-dev into this branch and update docs and add tests.

Done in f8c9191. Merged v0.6-dev (adds drop_table, is_core_table, and related tests), resolved the test file conflict, and updated docs:

  • New Deleting Data section in tables.rst documenting delete_from() with dry-run and filter examples
  • Expanded Writing Results section covering all write modes including insert
  • Updated write_mode parameter description in Loading Parameters to list insert

All 10 writer tests pass.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • repos.spark-packages.org
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -cp hive-jackson/*:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/conf:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/jars/slf4j-api-2.0.16.jar:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/jars/* -Xmx10g -Daws.region=us-east-2 -XX:&#43;IgnoreUnrecognizedVMOptions --add-modules=jdk.incubator.vector --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED (dns block)
  • sedona.gateway.scarf.sh
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -cp hive-jackson/*:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/conf:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/jars/slf4j-api-2.0.16.jar:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/jars/* -Xmx10g -Daws.region=us-east-2 -XX:&#43;IgnoreUnrecognizedVMOptions --add-modules=jdk.incubator.vector --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@mgdenno mgdenno marked this pull request as ready for review March 13, 2026 22:33
@mgdenno
Copy link
Contributor

mgdenno commented Mar 15, 2026

@copilot We now have ev.delete_from(), but I think we also want ev.table().delete() pattern to work.

Copy link
Contributor Author

Copilot AI commented Mar 15, 2026

@copilot We now have ev.delete_from(), but I think we also want ev.table().delete() pattern to work.

Added delete() to BaseTable in bf9c990. Both patterns now work:

# Via table instance (new)
ev.table("primary_timeseries").delete(filters=["location_id = 'usgs-01234567'"])
ev.primary_timeseries.delete(dry_run=True)  # returns SDF

# Via write interface (existing)
ev.write.delete_from(table_name="primary_timeseries", filters=[...])

The method delegates to ev.write.delete_from() and supports SQL strings, dicts, and TableFilter objects. 4 new tests added (SQL filter, dict filter, dry_run, no-filter/delete-all).

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • repos.spark-packages.org
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -cp hive-jackson/*:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/conf:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/jars/slf4j-api-2.0.16.jar:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/jars/* -Xmx10g -Daws.region=us-east-2 -XX:&#43;IgnoreUnrecognizedVMOptions --add-modules=jdk.incubator.vector --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED (dns block)
  • sedona.gateway.scarf.sh
    • Triggering command: /usr/lib/jvm/temurin-17-jdk-amd64/bin/java /usr/lib/jvm/temurin-17-jdk-amd64/bin/java -cp hive-jackson/*:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/conf:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/jars/slf4j-api-2.0.16.jar:/home/REDACTED/.local/lib/python3.12/site-packages/pyspark/jars/* -Xmx10g -Daws.region=us-east-2 -XX:&#43;IgnoreUnrecognizedVMOptions --add-modules=jdk.incubator.vector --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/jdk.internal.ref=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

@mgdenno
Copy link
Contributor

mgdenno commented Mar 15, 2026

@samlamont FYI

@mgdenno mgdenno merged commit 57b9869 into v0.6-dev Mar 15, 2026
@mgdenno mgdenno deleted the copilot/add-insert-and-delete-methods branch March 15, 2026 17:51
samlamont added a commit that referenced this pull request Mar 19, 2026
* 506 re evaluate class structure (#534)

* evaluation migration playground

* 0.6.0dev0 version

* version file error messages

* clone iceberg template

* remove global var

* upgrade existing evaluation

* iceberg io wip

* iceberg writes and read

* initial iceberg IO

* fixing tests

* convert evaluation updates

* add null-safe equality for append and upsert; test updates

* updates

* sedona snapshot integration

* add jar to lfs

* config tests

* optional s3 read access

* fix test

* update ev.sql to require views list

* cleanup

* writer class wip

* remove playground test file

* set s3 access to True by default for tests

* update migrate

* writer class wip

* writer wip

* rename class

* remove writes from base table

* fix location table writes

* fix fetching usgs daily dv (#530)

* fix fetching usgs daily dv

* version bump

* implement extract class

* extract files in parallel

* option to extract timeseries concurrently

* get output filenames when parallel=True

* fix create_or_replace writes

* fix fetch writes

* cleanup

* validator class

* use validator when extracting to cache

* list tables and views; write to_view()

* update spark config to official sedona and iceberg releases, include geotools for raster ops

* config update

* update remove spark jar

* write parquet to cache using pyarrow schema

* fix geopandas case

* use write.to_cache() for fetched data

* fix fetching tests

* fix generate

* fix test

* fix clone from s3 writes

* workflow class; base table cleanup

* fix read from warehouse

* cleanup

* add joined_timeseries to initial schema and conversion script

* fix convert_to_iceberg to run as script

* script updates

* only trigger docs and tests on pr to main

* rename validation methods

* fix test

* updates percentileeventdetection() and relevant tests (#533)

* updates percentileeventdetection() and relevant tests

* fix github actions branch, increment dev version

---------

Co-authored-by: samlamont <[email protected]>

* 535 fix datatype issue in thresholdvalueexceeded (#536)

* cast to float; collect test results to pandas

* Adds ThresholdValueNotExceeded

* increment dev version

* remove test comment

* fix doc strings

* fix test; trigger tests

* reset workflow

* 523 add flow duration curve (#541)

* adds exceedance_probability to TCFs and FDC_slope to signature metrics

* adds tests and fixes sorting errors

* corrects assumption when obtaining index of ranked flows corresponding with quantiles

* addressing feedback and enabling pytest natively in VScode

* remove WIP changes from test

* updates exceedance_probability() to improve performance and correct memory error

* adds options to return FDC metrics/TCF's as percentile instead of probability

* increment version to 0.5.1dev3 from 0.5.1dev2

* update workflow

* 496 create initial data warehouse (#552)

* initial warehouse setup and e4 ingest

* bump dev version

* wip, configure and read from remote and local catalogs simultaneously

* wip

* read class WIP

* read from cache defaults

* spark session in utc, formatting

* add methods to the read class

* some cleanup

* cleanup

* clone from s3, etc WIP

* generic table wip

* Load class wip

* domain table updates

* loading and table updates

* test updates

* updates and cleanup

* creates method to add_epsilon for signature/deterministic metrics (#544)

* creates method to add_epsilon for signature/deterministic metrics

* adds epsilon to new metrics from new PRs

* Updates add_epsilon attribute to be a bool, applies EPSILON as a global variable

* increments version to 0.5.1dev5 from 0.5.1dev4

* refactor for a more targeted application of epsilon

* Update workflows to point to main in teehr-hub

---------

Co-authored-by: samlamont <[email protected]>

* cleanup etc

* test update

* datatype transforms, etc wip

* field_enums, test paths

* load from cache when fetching

* test updates

* field enums to table properties dict

* calculate metrics on any table

* optimize update applied schema version

* fix subsetting secondary from remote

* fix accessor test and nwm test example setup

* fix write from pd dataframe

* update convert to iceberg

* update initialize warehouse

* update clone_template

* remove file_format table property

* update filter

* remove get_table_instance()

* remove table properties dict; move filter validation to validate class

* bring back table properties dict, assign to base table

* fix test

* remove warehouse dir

* update doc string

---------

Co-authored-by: Sam Landsteiner <[email protected]>

* 554 update GitHub action workflows (#559)

* update actions

* workflow cleanup

* update default tags

* add missing executor builds

* cherry-pick metrics doc list update

* adds BelowPercentileEventDetection method (#551)

* adds BelowPercentileEventDetection method

* fix error w metrics_tansforms test

* increment dev version to 0.5.1dev8 from 0.5.1dev7

* workflow cleanup

* 553 update spark session configuration (#564)

* testing spark session management WIP

* minimal session configuration updates

* remove test file

* pass spark session to convert_evaluation

* test update

* update spark session configuration

* allow for user-defined aws credentials

* credentials update

* update create_local_dir logic

* removing evaluation init args; aws credentials update

* temp minio configuration for testing

* update dev version

* check in_cluster env variable for remote catalog minio config

* tracking catalog metadata via spark session config

* fix dir_path error

* require dir_path and remove from create_spark_session

* comment update

* downgrade hadoop-aws package version; session fix

* switch to conf for spark session

* remove default partitioning

* fix migrations path; remove joned timeseries from shcema

* update anonymous remote catalog creds

* cleanup

* fix app_name bug

* update const vars, update apply_schema_migration()

* fix clone template

* cleanup test

* fix logger path

* remove action push trigger

* updates pearson and r_squared to consider epsilon (#562)

* updates pearson and r_squared to consider epsilon

* addressed PR feedback

* increment dev version to 0.5.1dev9 from 0.5.1dev8

* update naive spearman method to consider ties

* increment dev version in pyproject.toml

* 565 update signaturemetrics class to signatures (#566)

* updates src references to signature_metrics as signatures

* update documentation references to signatures instead of signaturemetrics

* update reference in example

* increment dev version in __init__.py and .toml

---------

Co-authored-by: samlamont <[email protected]>

* 537 improve sphinx api docs (#575)

* api update wip

* api docs wip

* api docs updates

* remove text

* remove references to local_warehouse_dir

* fix references to create_local_dir

* fix write mode on test nwm data setup

* update default catalog names in catalog models

* update create_dir logic

* update remote catalog vars

* tests update

* test updates

* change log file location

* joined_timeseries cleanup

* joined_timeseries and write doc strings

* location_attribute doc strings

* location_crosswalk doc strings

* location_table doc strings

* Timeseries table doc strings

* update evaluation doc strings

* load doc strings

* read class doc strings

* Update write class doc strings

* Update validate doc strings

* remove partition_by for write

* Update base table examples

* Update metrics class doc strings

* splitting up metrics tests, more metrics doc string examples

* rename classes to validate and extract

* clean up; fix tests

* update test

* remove table filter; add query with metrics to base_table; update tests

* remove list_s3_evaluations method

* fix accessor import; fix create_dir logic; test cleanup

* update evaluation conversion script; user guide notebooks

* update user guide notebook; e0_2 example evaluation in s3

* fix doc references to signaturemetrics

* fix references to BaseTable

* signaturemetrics fix

* update metrics doc strings

* calculated fields doc strings

* generate timeseries doc strings

* joined_timeseries inherits Table

* remove validate_and_apply_filters func

* cleanup

* check for aws credentials with boto

* fix bootstrapping for signatures

* fix gumboot test

* update query doc string

* fix validate() doc string example

* update s3 path style and endpoint to use env vars in spark_session

* fix boto cred check

* override query() for domain tables

* test updates

* increment dev version

* remove botocore credential setting

* temp log message

* remove anonymous provider in default

* re-implement boto creds; temp set default s3 endpoint as minio

* aws update

* cleanup log messages

* update spark session iceberg catalog config

* 580 update create spark session method (#581)

* add logging messages for debuggging

* update aws credentials logic

* remove empty credentials for anonymouse case

* 576 ability to generate and write partial joined timeseries (#593)

* add filters to create joined ts

* add h5netcdf

* add docs

* 591 possible bug in foreign key enforcement using load dataframe (#594)

* update load_dataframe() so field order matches schema order

* add missing service arg to fetch usgs

* fix bug with drop overlapping assim flag (#597)

* 549 implement additional deterministic and probabilistic metrics (#571)

* stash changes

* stashing changes again

* Implement conditional deterministic metrics

* addresses PR feedback, adds documentation for cat. metrics

* update the getting_started docs (#586)

* update the getting_started docs

* Addresses additional errors in the docs

* 599 update spark session script (#600)

* reorganize functions

* update session script, add test

* rename arg

* undo fetch and load edits

* corrects error with unpack_results argument (#589)

* corrects error with unpack_results argument

* update bootstrap pydantic models

---------

Co-authored-by: samlamont <[email protected]>

* adds rcf ForecastLeadTimeBins, corrects misc. formatting (#601)

* adds rcf ForecastLeadTimeBins, corrects misc. formatting

* update docs and API formatting w.r.t ForecastLeadTimeBins

* adds staticmethod decorators to class methods

* refactors ForecastLeadTimeBins

* adds new tests for ForecastLeadTimeBins

* update tests

* adds additional input type support, support for mixed input types

---------

Co-authored-by: samlamont <[email protected]>

* update v0_4 ensemble test data script

* increment dev version

* update clone from s3

* Update pyproject.toml (#605)

* remove h5netcdf as test

* change protobuf version and add h5netcdf back

* force pyspark to 4.0.x

* fix pyspark at 4.0.0 and protobuf at 5.28.3

* add some logging

* try changing the __call__ method to return a new object,

* revert creating a new class and remove default value

* add default back but change the call in evaluation

* update tests but they don't pass due to hard coded path.

* rollback setup_v0_3_study to start from scratch

* update pyspark to 4.0.1

* revert metrics initialization for testing

* update tests

* update lock file

* bump dev version

---------

Co-authored-by: samlamont <[email protected]>

* Update brier score in v0.6dev (#608)

* setup v03 study update

* update test

* test updates

* update tests

* test updates

* more test updates

---------

Co-authored-by: Matt Denno <[email protected]>

* update gitignore

* 604 table name is not used in metrics (#609)

* let metrics class inherit base table

* cleanup

* more cleanup

* Add generic table interface

* update metrics interface

* update metrics class and cleanup

* test cleanup

* clean up to_geopandas()

* set default table name for joined_timeseries write() method

* update test

* 612 update v06 conversion workflow (#614)

* update evaluation upgrade process

* handle zone.idenfier files

* 615 update doc strings (#616)

* doc string updates

* update table class call doc strings

* more doc string updates

* fix load doc string

* doc string updates

* doc string and api doc updats

* update changelog

* cleanup unused imports

* add support for aws profile (#628)

* add support for aws profile

* clean up some logging

* fix priority in spark session utils

* adds spark decomissioning to session (#629)

* 611 revisit the test suite (#633)

* testing local jdbc catalog

* test re-writing metadata

* add rewrite_table_paths method

* update test data

* define spark session fixture for all tests

* add session fixture

* updating tests

* update test data

* update tests wip

* remove register warehouse test

* update local catalog type

* remove setup func

* cleanup

* remove jdbc test file

* cleanup

* upgrade to iceberg 1.10.1 bug fix; use newSession()

* fixture updates wip

* test updates

* clean up test data

* more updates

* test data cleanup

* test data updates

* more updates

* cache before validating domain vars

* wip

* update tests wip

* updates

* update_metadata_paths utility

* test updates

* setup large ensemble test warehouse

* rename test ensemble tar

* small ensemble test warehouse

* two locations test warehouse

* rename fixtures

* update metrics tests

* update fixture scopes

* scope updates

* move example data

* move hefs example data

* test data update

* update test

* skip test due to data link

* comment logger message

* remove copilot guide

* add tests readme

* readme update

* fix docstring

* fix aws auth priority (#639)

* 630 the constant field values arg is missing from load spatial (#640)

* adds constant_field_values arg to load_spatial method

* Revert "adds constant_field_values arg to load_spatial method"

* adds constant_field_values to load.py::load_spatial()

* 643 remove pandas dataframe accessor (#644)

* remove accessor; cleanup playground example

* remove references to accessor in docs

* add remote read only evaluation (#637)

* add remote read only evaluation

* add note to doc string

---------

Co-authored-by: samlamont <[email protected]>

* 635 table filter does not work (#641)

* remove field enums; remove table-specific filter models; adds __getattr__ to base table to delegate dataframe attrs

* clean up; tests pass

* fix load spark dataframe bug

* capture uniqueness_fields is None error

* revert check_load_table conditional

* capture user errors

* add documentation

* avoid loading sdf in __dir__

* remove __dir__ method for now

* undo __getattr__

* remove __getattr__

* deprecate metrics class

* remove note about delegated attrs from docs

* remove unused imports

* Update tests/load/test_import_timeseries.py

Co-authored-by: Copilot <[email protected]>

* Update src/teehr/models/filters.py

Co-authored-by: Copilot <[email protected]>

* Update src/teehr/evaluation/write.py

Co-authored-by: Copilot <[email protected]>

* address filter example doc string

* Update tests/query/test_filters.py

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>

* Mdenno/645 update spark config for spot instance executors (#646)

* update spark config for spot instances

* add comment

* pin netcdf4 (#651)

* Mgdenno/620 add load functions to base table (#654)

* adds teehr api support

* allow the api url and port to be set

* download wip

* more wip

* udpate test

* adds evaluation subset; module level template fixture; download tests

* include bbox; remove unused fixture

* cleanup; remove workflow class

* address copilot feedback

---------

Co-authored-by: Matt Denno <[email protected]>

* 657 add dir path to remotereadonlyevaluation class (#660)

* add dir path

* update wip

* rename dir path

* add logger message; cleanup

* add readwrite evaluation with read only flag

* Catch remote evaluation config errors

* update temp_dir_path

* ignore cleanup errors for temp dirs

* remove ignore errors

* Mdenno/655 odd chaning behavior (#662)

* update join logic

* remove unused imports

* add spark session setting for local

* update join to handel inst data better

* add tests for new join

* split out validate filters from table read

* minor format

* adds apply filters to basetable

* initial table refector to remove call

* fix filter to return sdf if no filters

* add logic to only apply filters if provided

* refactor tables and views

* upgrade pyogrio to latest to remove warning

* remove df.attrs and redundant to_pandas methods

* update table imports

* update distinct_values func and tests

* add geo to df base

* update tables to contain props

* update validate and write

* fix doc, remove to_view

* a little clean up on extract

* few changes to vscode settings

* fix usgs test

* update writer

* fix schema args

* update table props and load calls

* update doc stings and return types in loading

* remove TableWriteEnum for str

* clean up flake8 issues and delete some test outputs.

* Update src/teehr/evaluation/read.py

Co-authored-by: Copilot <[email protected]>

* cleanup unused imports

* update vscode settings

* removed unneeded test outputs

* Add ev.sql() wrapper that auto-sets active catalog and namespace (#663)

* Initial plan

* Add ev.sql() thin wrapper that sets active catalog and namespace

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* Fix: don't drop caller-provided temp view in `to_warehouse()` (#664)

* Initial plan

* Fix: only drop temp view in to_warehouse() if created internally

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* clean up

* update sql calls

* clean unused import

* update validation method names

* fix secondary timeseries add_geo to support chaining

* fix attrs view to follow conventions

* fix formatting

* fix bad method

* use module_scope_test_warehouse in test_views.py

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: mgdenno <[email protected]>
Co-authored-by: samlamont <[email protected]>

* 661 revisit evaluation initialization (#665)

* remove clone_from_s3 and sql methods from evaluation class

* remove template

* add baseevaluation

* Evaluation classes wip

* cleanup apply_migrations; rename local catalog db

* create version file; cleanup

* update ev directory logic and cleanup

* cleanup

* fixing merge conflicts

* add example

* update doc string

* cleanup

* Update tests/conftest.py

Co-authored-by: Copilot <[email protected]>

* update logic in check_evaluation_version

* add dir_path and cache_dir

* Add LocalReadWriteEvaluation

* remove test

* update tests to use local ev class

* update references to evaluation class to local ev

* remove examples dir

* cleanup test

* conditionally set nodegroup name based on environ variable

---------

Co-authored-by: Copilot <[email protected]>

* add jupyter use to spark exec  pod prefix (#669)

* Mdenno/add id list to download locations (#670)

* use cached property

* make location download take ids

* add logger msg

* optimize teehr_api_timeseries_to_dataframe

* pagination test

* expose page_size arg

---------

Co-authored-by: samlamont <[email protected]>

* add sort order to migrations apply (#675)

* add sort order to migrations apply

* only apply *.sql files

* Strip whitespace and filter out empty statements

* Mdenno/update docs with copilot (#667)

* first pass update api docs

* refactored base dataframe name.  Update some docs.

* first pass at the user docs update

* some manual updates to docs

* a bit more review of docs

* doc string updates

* api doc updates wip

* api docs wip

* update metric model doc strings

* update class template

* update on this page section panel

* add example notebooks page

* update loading local data notebook

* update notebooks

* update ensemble notebook

* notebooks wip

* edit load from_cache; update notebooks

* notebooks etc

* update setup nwm grid example; revert load from_cache changes

* test version switcher

* add links to evaluation and tables pages

* add links to views page

* add links to fetching and metrics pages

* additional links

* make .show consistent in notebooks

* updates metrics table

* readme updates

* table edit

* image tweak

* fix ciroh logo

* readme update

* update Evaluation to LocalReadWriteEvaluation

---------

Co-authored-by: samlamont <[email protected]>
Co-authored-by: samland1116 <[email protected]>

* Add drop table method for user-created tables (#680)

* Initial plan

* Add drop table method to tables - resolves issue

Co-authored-by: mgdenno <[email protected]>

* Merge v0.6-dev and add docs for drop() and drop_table() methods

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* Add `name` field to geometry join (#681)

* Initial plan

* Add name to join_geometry function

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* Add INSERT INTO and DELETE FROM write methods (#682)

* Initial plan

* Add INSERT write mode and delete_from method to Write class

Co-authored-by: mgdenno <[email protected]>

* Merge v0.6-dev, update docs for INSERT and DELETE methods

Co-authored-by: mgdenno <[email protected]>

* Add delete() method to BaseTable for ev.table().delete() pattern

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>
Co-authored-by: Matt Denno <[email protected]>

* Add catalog_name and namespace_name parameters to view methods (#684)

* Initial plan

* Add catalog_name and namespace_name parameters to view methods

Co-authored-by: mgdenno <[email protected]>

* Always create temp views for variables/location_crosswalks in view computation

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* Add generic `add_attributes` method to `TeehrDataFrameBase` (#688)

* Initial plan

* Add generic add_attributes method to TeehrDataFrameBase

Co-authored-by: mgdenno <[email protected]>

* Update add_attributes docstring with post-query use case example

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>
Co-authored-by: Matt Denno <[email protected]>

* Remove unused S3Path class (#693)

* Initial plan

* Remove S3Path class and related utilities

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* Add GenericSQL row-level calculated field (#697)

* Initial plan

* Add GenericSQL row-level calculated field

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* Implement automatic pagination for all download methods (#699)

* Initial plan

* Implement offset and limit parameters for all download methods

Co-authored-by: mgdenno <[email protected]>

* Refactor download methods: use page_size with automatic pagination instead of limit/offset

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* Mdenno/689 data validation when adding domain data doesnt work (#700)

* fix bug in validation where invalid values not caught

* fix tests

* remove description comma constraint

* cleanup commented code

* remove commented code

* Add `primary_location_id_prefix` and `secondary_location_id_prefix` to `download.location_crosswalks()` (#702)

* Initial plan

* Add primary_location_id_prefix and secondary_location_id_prefix to download.location_crosswalks()

Co-authored-by: mgdenno <[email protected]>

* Pass prefix params as API query parameters, not to _load.dataframe()

Co-authored-by: mgdenno <[email protected]>

* Update sphinx docs to use correct prefix parameter names

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* Add user-configurable timeout to all ev.download methods (default 60s) (#705)

* Initial plan

* Add timeout parameter to all ev.download methods (default 60s)

Co-authored-by: mgdenno <[email protected]>

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: mgdenno <[email protected]>

* fix pagination logic and little clean up (#707)

* fix pagination logic and little clean up

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

* cleanup error handling

---------

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

* refactor evaluation class (#668)

* Small metrics refactor (#694)

* fix metrics relatve paths

* metrics refactor reduce duplicate code

* clean up boostrap and probabilistic models; small doc updates

* remove duplicate drop tables section

---------

Co-authored-by: samlamont <[email protected]>

* adds success_ratio and frequency_bias_index (#710)

* fix broken download test (#711)

* 695 reevaluate the query method (#712)

* query to aggregrate

* update all query tests

* more test updates

* update references to query method

* update query references in user guide

* notebook updates

* fix notebook

* fix query references

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

---------

Co-authored-by: Copilot Autofix powered by AI <[email protected]>

* remove s3 dir under loading

* delete unneeded gh workflow

* fix example data paths

* set version to v0.6.0

* fix some merge issues

* remove duplicate code

* changelog updates

* fix benchmark forecast bug

* fix changelog

* minor update

---------

Co-authored-by: Sam Landsteiner <[email protected]>
Co-authored-by: Matt Denno <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: mgdenno <[email protected]>
Co-authored-by: samland1116 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add INSERT and DELETE to write methods

2 participants