This repository was archived by the owner on Jul 14, 2023. It is now read-only.
This repository was archived by the owner on Jul 14, 2023. It is now read-only.
Épico: Implementar partição e cluster em tabelas materializadas #34
Open
Description
Tabelas particionadas e/ou clusterizadas têm menor custo de consulta e melhor performance (Ver https://www.youtube.com/watch?v=wapi0aR4BZE). É possível implementar tanto o particionamento quanto a clusterização na hora da materialização usando DBT.
Para um exemplo, veja o modelo da tabela twitter_metrics
:
{{
config(
materialized='incremental',
partition_by={
"field": "upload_day",
"data_type": "date",
"granularity": "month",
}
)
}}
SELECT *
FROM
(SELECT
SAFE_CAST(upload_ts AS INT64) upload_ts,
EXTRACT(DATE FROM TIMESTAMP_MILLIS(upload_ts*1000)) AS upload_day,
SAFE_CAST(id AS STRING) id,
SAFE_CAST(text AS STRING) text,
SAFE_CAST(created_at AS STRING) created_at,
SAFE_CAST(retweet_count AS INT64) retweet_count,
SAFE_CAST(reply_count AS INT64) reply_count,
SAFE_CAST(like_count AS INT64) like_count,
SAFE_CAST(quote_count AS INT64) quote_count,
SAFE_CAST(impression_count AS FLOAT64) impression_count,
SAFE_CAST(user_profile_clicks AS FLOAT64) user_profile_clicks,
SAFE_CAST(url_link_clicks AS FLOAT64) url_link_clicks,
SAFE_CAST(following_count AS INT64) following_count,
SAFE_CAST(followers_count AS INT64) followers_count,
SAFE_CAST(tweet_count AS INT64) tweet_count,
SAFE_CAST(listed_count AS INT64) listed_count
FROM `basedosdados-dev.br_bd_indicadores_staging.twitter_metrics`)
WHERE
upload_day <= CURRENT_DATE('America/Sao_Paulo')
{% if is_incremental() %}
{% set max_partition = run_query("SELECT gr FROM (SELECT IF(max(upload_day) > CURRENT_DATE('America/Sao_Paulo'), CURRENT_DATE('America/Sao_Paulo'), max(upload_day)) as gr FROM " ~ this ~ ")").columns[0].values()[0] %}
AND
upload_day > ("{{ max_partition }}")
{% endif %}
Tabelas a serem particionadas:
- [partition/cluster]
br_ibge_pnad.microdados
#37 - tse/candidatos
- tse/receita_candidatos
- tse/despesa_candidatos