-
Notifications
You must be signed in to change notification settings - Fork 94
Description
Minimal Code To Reproduce
Describe the bug
I have a set of unit tests that check the functionality of code that uses the fugue_sql
API with a DuckDB backend. When running these tests locally, they all pass without any issue. However, when I run these as part of a Github actions workflow, I frequently encounter a segmentation fault that occurs at the following location
Current thread 0x00007f4e615547[40](https://github.com/****/****/actions/runs/4555672657/jobs/8035039892#step:7:41) (most recent call first):
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/dataframe.py", line 101 in as_arrow
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/dataframe.py", line 110 in as_local_bounded
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/dataframe/dataframe.py", line 90 in as_local
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/execution_engine.py", line 521 in convert_yield_dataframe
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_tasks.py", line 1[47](https://github.com/****/****/actions/runs/4555672657/jobs/8035039892#step:7:48) in set_result
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_tasks.py", line 293 in execute
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 683 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 171 in run_single
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 155 in run_tasks
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 129 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 270 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_workflow_context.py", line 54 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/workflow.py", line 1584 in run
File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/sql/api.py", line 107 in fugue_sql
The function that fails has the following form
def filter_df(
df: pd.DataFrame,
outlets: pd.DataFrame,
adjustments: pd.DataFrame,
):
query = """keys = SELECT DateId, ProductId, LocationId, AdjustmentFactor, AdjustmentType, id
FROM adjustments INNER JOIN outlets USING (LocationId)
fdt = SELECT * FROM keys INNER JOIN df USING (DateId, ProductId, LocationId)"""
result = fa.fugue_sql(
query,
df=df,
outlets=outlets,
adjustments=adjustments,
engine='duckdb',
as_fugue=True,
)
return result.as_pandas()
And I have multiple unit tests that call this function. It's difficult to fully isolate the problem as I can't fully reproduce it locally.
In this instance, I have been able to refactor my function to use the fugue api, but it would be good to be able to use the fugue_sql API for more complex queries where the SQL syntax is more suitable.
from fugue import api as fa
df = fa.join(...)
df = fa.filter(...)
Expected behavior
I would expect these unit tests to run successfully.
Environment (please complete the following information):
- Backend: pandas (duckdb)
- Backend version: 0.8.2
- Python version: 3.10
- OS: linux