Skip to content

Support Dataset from Spider 2 evaluation framework #1813

Open
@dcvtruong

Description

@dcvtruong

To create the dataset in Langsmith preferably by a tool instead of assembling manually.

jsonl :
{"instance_id": "bq011", "db": "ga4", "question": "How many distinct pseudo users had positive engagement time in the 7-day period ending on January 7, 2021 at 23:59:59, but had no positive engagement time in the 2-day period ending on the same date (January 7, 2021 at 23:59:59) ?", "external_knowledge": "ga4_obfuscated_sample_ecommerce.events.md"}

json eval:
{"instance_id": "bq011", "condition_cols": [], "ignore_order": false, "toks": "132"}

sql:
SELECT
COUNT(DISTINCT MDaysUsers.user_pseudo_id) AS n_day_inactive_users_count
FROM
(
SELECT
user_pseudo_id
FROM
bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_* AS T
CROSS JOIN
UNNEST(T.event_params) AS event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
AND event_timestamp > UNIX_MICROS(TIMESTAMP_SUB(TIMESTAMP('2021-01-07 23:59:59'), INTERVAL 7 DAY))
AND _TABLE_SUFFIX BETWEEN '20210101' AND '20210107'
) AS MDaysUsers
LEFT JOIN
(
SELECT
user_pseudo_id
FROM
bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_* AS T
CROSS JOIN
UNNEST(T.event_params) AS event_params
WHERE
event_params.key = 'engagement_time_msec' AND event_params.value.int_value > 0
AND event_timestamp > UNIX_MICROS(TIMESTAMP_SUB(TIMESTAMP('2021-01-07 23:59:59'), INTERVAL 2 DAY))
AND _TABLE_SUFFIX BETWEEN '20210105' AND '20210107'
) AS NDaysUsers
ON MDaysUsers.user_pseudo_id = NDaysUsers.user_pseudo_id
WHERE
NDaysUsers.user_pseudo_id IS NULL;

exec_result:
n_day_inactive_users_count
12212

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions