Skip to content
This repository has been archived by the owner on Jul 14, 2023. It is now read-only.

Épico: Implementar partição e cluster em tabelas materializadas #34

Open
4 tasks
lucascr91 opened this issue Oct 13, 2022 · 0 comments
Open
4 tasks
Assignees

Comments

@lucascr91
Copy link
Collaborator

lucascr91 commented Oct 13, 2022

Tabelas particionadas e/ou clusterizadas têm menor custo de consulta e melhor performance (Ver https://www.youtube.com/watch?v=wapi0aR4BZE). É possível implementar tanto o particionamento quanto a clusterização na hora da materialização usando DBT.

Para um exemplo, veja o modelo da tabela twitter_metrics:

{{
    config(
        materialized='incremental',
        partition_by={
            "field": "upload_day",
            "data_type": "date",
            "granularity": "month",
        }
    )
}}


SELECT *
FROM
(SELECT
SAFE_CAST(upload_ts AS INT64) upload_ts,
EXTRACT(DATE FROM TIMESTAMP_MILLIS(upload_ts*1000)) AS upload_day,
SAFE_CAST(id AS STRING) id,
SAFE_CAST(text AS STRING) text,
SAFE_CAST(created_at AS STRING) created_at,
SAFE_CAST(retweet_count AS INT64) retweet_count,
SAFE_CAST(reply_count AS INT64) reply_count,
SAFE_CAST(like_count AS INT64) like_count,
SAFE_CAST(quote_count AS INT64) quote_count,
SAFE_CAST(impression_count AS FLOAT64) impression_count,
SAFE_CAST(user_profile_clicks AS FLOAT64) user_profile_clicks,
SAFE_CAST(url_link_clicks AS FLOAT64) url_link_clicks,
SAFE_CAST(following_count AS INT64) following_count,
SAFE_CAST(followers_count AS INT64) followers_count,
SAFE_CAST(tweet_count AS INT64) tweet_count,
SAFE_CAST(listed_count AS INT64) listed_count
FROM `basedosdados-dev.br_bd_indicadores_staging.twitter_metrics`)
WHERE
    upload_day <= CURRENT_DATE('America/Sao_Paulo')

{% if is_incremental() %}

{% set max_partition = run_query("SELECT gr FROM (SELECT IF(max(upload_day) > CURRENT_DATE('America/Sao_Paulo'), CURRENT_DATE('America/Sao_Paulo'), max(upload_day)) as gr FROM " ~ this ~ ")").columns[0].values()[0] %}

AND
    upload_day > ("{{ max_partition }}")

{% endif %}

Tabelas a serem particionadas:

@crislanealves crislanealves reopened this Dec 11, 2022
@guialvesp1 guialvesp1 changed the title [infra] Implementar partição e cluster em tabelas materializadas Épico: Implementar partição e cluster em tabelas materializadas Jan 19, 2023
@guialvesp1 guialvesp1 moved this to 🌐 Épicos in Databasers Jan 19, 2023
@lucascr91 lucascr91 removed this from Databasers Apr 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants