Skip to content

[BUG] fugue_sql intermittently throwing segmentation fault errors #462

@jstammers

Description

@jstammers

Minimal Code To Reproduce

Describe the bug
I have a set of unit tests that check the functionality of code that uses the fugue_sql API with a DuckDB backend. When running these tests locally, they all pass without any issue. However, when I run these as part of a Github actions workflow, I frequently encounter a segmentation fault that occurs at the following location

Current thread 0x00007f4e615547[40](https://github.com/****/****/actions/runs/4555672657/jobs/8035039892#step:7:41) (most recent call first):
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/dataframe.py", line 101 in as_arrow
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/dataframe.py", line 110 in as_local_bounded
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/dataframe/dataframe.py", line 90 in as_local
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue_duckdb/execution_engine.py", line 521 in convert_yield_dataframe
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_tasks.py", line 1[47](https://github.com/****/****/actions/runs/4555672657/jobs/8035039892#step:7:48) in set_result
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_tasks.py", line 293 in execute
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 683 in run
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 171 in run_single
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 155 in run_tasks
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 129 in run
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/adagio/instances.py", line 270 in run
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/_workflow_context.py", line 54 in run
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/workflow/workflow.py", line 1584 in run
  File "/home/runner/.cache/pypoetry/virtualenvs/****-CeyU5fzd-py3.10/lib/python3.10/site-packages/fugue/sql/api.py", line 107 in fugue_sql

The function that fails has the following form

def filter_df(
    df: pd.DataFrame,
    outlets: pd.DataFrame,
    adjustments: pd.DataFrame,
):
    query = """keys = SELECT DateId, ProductId, LocationId, AdjustmentFactor, AdjustmentType, id
    FROM adjustments INNER JOIN outlets USING (LocationId)
    fdt = SELECT * FROM keys INNER JOIN df USING (DateId, ProductId, LocationId)"""
    result = fa.fugue_sql(
        query,
        df=df,
        outlets=outlets,
        adjustments=adjustments,
        engine='duckdb',
        as_fugue=True,
    )
    return result.as_pandas()

And I have multiple unit tests that call this function. It's difficult to fully isolate the problem as I can't fully reproduce it locally.

In this instance, I have been able to refactor my function to use the fugue api, but it would be good to be able to use the fugue_sql API for more complex queries where the SQL syntax is more suitable.

from fugue import api as fa

df = fa.join(...)
df = fa.filter(...)

Expected behavior
I would expect these unit tests to run successfully.

Environment (please complete the following information):

  • Backend: pandas (duckdb)
  • Backend version: 0.8.2
  • Python version: 3.10
  • OS: linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions