Skip to content

[Bug] py_extract_from_source syntax errors for a seemingly correct dbt model sql #103

@kansuke-at-trimble

Description

@kansuke-at-trimble

Is this a new bug in dbt-extractor?

  • I believe this is a new bug in dbt-extractor
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When I am executing py_extract_from_source(), it yields dbt_extractor.ExtractionError: Source Error: Syntax error in source which I have no clue where it's pointing to. The following dbt sql was fed into the function (note, some parts are obfuscated for security reason):

{{
    config(
        materialized="incremental",
        unique_key=['partition_number','partition_offset'],
        incremental_strategy="merge"
    )
}}

-- comment 
WITH final AS (
    SELECT
        produced_at,
        key AS message_key,
        -- comment
        JSON_EXTRACT_SCALAR(message, '$.json.field1') AS error_type,
        JSON_EXTRACT_SCALAR(message, '$.json.field2') AS error_code,
        partition_number,
        partition_offset,
    FROM 
        {{ ref("table1") }}
    WHERE 
        -- comment
        JSON_EXTRACT(message, '$.field3') IS NOT NULL
        {% if is_incremental() %}
        -- comment
        AND produced_at > (SELECT TIMESTAMP_SUB(MAX(produced_at), INTERVAL 12 HOUR) FROM {{ this }})
        {% endif %}
)

SELECT 
    * 
FROM 
    final

Expected Behavior

It should yield a dictionary containing all the related information.

Steps To Reproduce

import dbt_extractor

c = """
{{
    config(
        materialized="incremental",
        unique_key=['partition_number','partition_offset'],
        incremental_strategy="merge"
    )
}}

-- comment 
WITH final AS (
    SELECT
        produced_at,
        key AS message_key,
        -- comment
        JSON_EXTRACT_SCALAR(message, '$.json.field1') AS error_type,
        JSON_EXTRACT_SCALAR(message, '$.json.field2') AS error_code,
        partition_number,
        partition_offset,
    FROM 
        {{ ref("table1") }}
    WHERE 
        -- comment
        JSON_EXTRACT(message, '$.field3') IS NOT NULL
        {% if is_incremental() %}
        -- comment
        AND produced_at > (SELECT TIMESTAMP_SUB(MAX(produced_at), INTERVAL 12 HOUR) FROM {{ this }})
        {% endif %}
)

SELECT 
    * 
FROM 
    final

"""
dbt_extractor.py_extract_from_source(c)

Relevant log output

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions