Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add event_time page #6383

Open
wants to merge 35 commits into
base: current
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
46231b0
add event_time page
mirnawong1 Oct 30, 2024
33a66a8
Merge branch 'current' into add-event-time
mirnawong1 Oct 30, 2024
4f2c6dc
add img and rn
mirnawong1 Oct 30, 2024
57ee608
fix link
mirnawong1 Oct 30, 2024
3354c9d
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Oct 30, 2024
603c21c
fix link again
mirnawong1 Oct 30, 2024
488460c
Update event-time.md
mirnawong1 Oct 30, 2024
2fb62c5
Update release-notes.md
mirnawong1 Oct 30, 2024
69ba339
Update event-time.md
mirnawong1 Oct 30, 2024
1ebbbdb
Update advanced-ci.md
mirnawong1 Oct 30, 2024
2b713ee
Update advanced-ci.md
mirnawong1 Oct 30, 2024
c789601
Update advanced-ci.md
mirnawong1 Oct 30, 2024
5708119
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 4, 2024
903c5d1
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
2dd873a
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 4, 2024
b7a07be
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 4, 2024
016c555
Update event-time.md
mirnawong1 Nov 4, 2024
2910914
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
79128fe
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
551821d
update img
mirnawong1 Nov 6, 2024
735ae38
fix img size
mirnawong1 Nov 6, 2024
ac7616b
Merge branch 'current' into add-event-time
mirnawong1 Nov 6, 2024
0363051
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
d693c9b
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
809f2a7
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
81e2318
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
14632b3
Merge branch 'current' into add-event-time
mirnawong1 Nov 6, 2024
2b98454
Merge branch 'current' into add-event-time
mirnawong1 Nov 11, 2024
3da521f
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 11, 2024
edd1123
add scenarios
mirnawong1 Nov 11, 2024
52c0db9
add scenarios
mirnawong1 Nov 11, 2024
f461ffa
fold in grace's feedback
mirnawong1 Nov 11, 2024
a4f3b23
Merge branch 'current' into add-event-time
mirnawong1 Nov 11, 2024
a1c8166
Merge branch 'add-event-time' of github.com:dbt-labs/docs.getdbt.com …
mirnawong1 Nov 11, 2024
f1969f4
remove redundant
mirnawong1 Nov 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Refer to [Supported incremental strategies by adapter](/docs/build/incremental-s

Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations.

Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.
Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches separately — in the future, concurrently — and to retry them independently.

Expand Down Expand Up @@ -162,7 +162,7 @@ Several configurations are relevant to microbatch models, and some are required:

| Config | Type | Description | Default |
|----------|------|---------------|---------|
| `event_time` | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| [`event_time`](/reference/resource-configs/event-time) | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A |
| `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A |
| `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` |
Expand Down
3 changes: 3 additions & 0 deletions website/docs/docs/dbt-versions/release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@
- Better error messaging for queries that can't be parsed correctly.

## October 2024

- **New**: Use the `event_time` configuration to specify when an event occurred. This configuration is required for [Incremental microbatch](/docs/build/incremental-microbatch) and can be added to ensure you're comparing overlapping times in [Advanced CI's compare changes](/docs/deploy/advanced-ci). Available in dbt Cloud Versionless and dbt Core v1.9 and higher. Refer to [event_time](/reference/resource-configs/event-time) for more information.

Check warning on line 30 in website/docs/docs/dbt-versions/release-notes.md

View workflow job for this annotation

GitHub Actions / vale

[vale] website/docs/docs/dbt-versions/release-notes.md#L30

[custom.Typos] Oops there's a typo -- did you really mean 'event_time'?
Raw output
{"message": "[custom.Typos] Oops there's a typo -- did you really mean 'event_time'? ", "location": {"path": "website/docs/docs/dbt-versions/release-notes.md", "range": {"start": {"line": 30, "column": 21}}}, "severity": "WARNING"}

Check warning on line 30 in website/docs/docs/dbt-versions/release-notes.md

View workflow job for this annotation

GitHub Actions / vale

[vale] website/docs/docs/dbt-versions/release-notes.md#L30

[custom.Typos] Oops there's a typo -- did you really mean 'v1.9'?
Raw output
{"message": "[custom.Typos] Oops there's a typo -- did you really mean 'v1.9'? ", "location": {"path": "website/docs/docs/dbt-versions/release-notes.md", "range": {"start": {"line": 30, "column": 350}}}, "severity": "WARNING"}

<Expandable alt_header="Coalesce 2024 announcements">

Documentation for new features and functionality announced at Coalesce 2024:
Expand Down
8 changes: 8 additions & 0 deletions website/docs/docs/deploy/advanced-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,14 @@ dbt reports the comparison differences in:

<Lightbox src="/img/docs/dbt-cloud/example-ci-compare-changes-tab.png" width="85%" title="Example of the Compare tab" />

### Speeding up comparisons
It's common for CI jobs to only [build a subset of data](/best-practices/best-practice-workflows#limit-the-data-processed-when-in-development), for example only the last 7 days of data. When an [`event_time`](/reference/resource-configs/event-time) column is specified on your model, compare changes can:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

- Compare data in CI against production for only the overlapping times, avoiding false positives and returning results faster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these bullets have the same benefit of "using only the overlapping timeframe, which avoids incorrect row-count changes and returns results faster"

I would distinguish the 2 scenarios as:

  • scenarios where your CI job only builds a subset of data
  • scenarios where your CI job contains fresher data than production

Rather than nesting the second scenario within the first - lmk if that makes sense!

Copy link
Contributor Author

@mirnawong1 mirnawong1 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to this;

It's common for CI jobs to only build a subset of data (for example only the last 7 days of data).

When an event_time column is specified on your model, compare changes can optimize comparisons by using only the overlapping timeframe (meaning the timeframe exists in both the CI and production environment), helping you avoid incorrect row-count changes to return results faster.

This is useful in scenarios like:

  • Subset of data in CI — When CI builds only a subset of data (like the most recent 7 days), compare changes might interpret the excluded data as "deleted rows." Configuring event_time allows you to avoid this issue by limiting comparisons to the overlapping timeframe, preventing false alerts about data deletions that are just filtered out in CI.
  • Fresher data in CI than in production — When your CI job includes fresher data than production, compare changes might flag the additional rows as "new" data, even though they’re just fresher data in CI. With event_time configured, the comparison only includes the shared timeframe and correctly reflects actual changes in the data.

- Handle scenarios where CI contains fresher data than production by using only the overlapping timeframe, which avoids incorrect row-count changes.

<Lightbox src="/img/docs/deploy/apples_to_apples.png" width="90%" title="event_time ensures the same time-slice of data is accurately compared between your CI and production environments." />

mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
## About the cached data

After [comparing changes](#compare-changes), dbt Cloud stores a cache of no more than 100 records for each modified model for preview purposes. By caching this data, you can view the examples of changed data without rerunning the comparison against the data warehouse every time (optimizing for lower compute costs). To display the changes, dbt Cloud uses a cached version of a sample of the data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on) that's set in the CI job's environment.
Expand Down
258 changes: 258 additions & 0 deletions website/docs/reference/resource-configs/event-time.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
---
title: "event_time"
id: "event-time"
sidebar_label: "event_time"
resource_types: [models, seeds, source]
description: "dbt uses event_time to understand when an event occurred. When defined, event_time enables microbatch incremental models and more refined comparison of datasets during Advanced CI."
datatype: string
---

Available in dbt Cloud Versionless and dbt Core v1.9 and higher.

Check warning on line 10 in website/docs/reference/resource-configs/event-time.md

View workflow job for this annotation

GitHub Actions / vale

[vale] website/docs/reference/resource-configs/event-time.md#L10

[custom.Typos] Oops there's a typo -- did you really mean 'v1.9'?
Raw output
{"message": "[custom.Typos] Oops there's a typo -- did you really mean 'v1.9'? ", "location": {"path": "website/docs/reference/resource-configs/event-time.md", "range": {"start": {"line": 10, "column": 49}}}, "severity": "WARNING"}

<Tabs>
<TabItem value="model" label="Models">

<File name='dbt_project.yml'>

```yml
models:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>


<File name='models/properties.yml'>

```yml
models:
- name: model_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```
</File>

<File name="models/modelname.sql">

```sql
{{ config(
event_time='my_time_field'
) }}
```

</File>

</TabItem>

<TabItem value="seeds" label="Seeds">

<File name='dbt_project.yml'>

```yml
seeds:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<File name='seeds/properties.yml'>

```yml
seeds:
- name: seed_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```

</File>
</TabItem>

<TabItem value="snapshot" label="Snapshots">

<File name='dbt_project.yml'>

```yml
snapshots:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<VersionBlock firstVersion="1.9">
<File name='snapshots/properties.yml'>

```yml
snapshots:
- name: snapshot_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```
</File>
</VersionBlock>

<VersionBlock lastVersion="1.8">

<File name="models/modlename.sql">

```sql

{{ config(
event_time: 'my_time_field'
) }}
```

</File>


import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md';

<SnapshotYaml/>
</VersionBlock>



</TabItem>

<TabItem value="sources" label="Sources">

<File name='dbt_project.yml'>

```yml
sources:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<File name='models/properties.yml'>

```yml
sources:
- name: source_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```

</File>
</TabItem>
</Tabs>

## Definition

Set the `event_time` to the name of the field that represents the timestamp of the event, as opposed to a date-like data loading date. You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

Here are some examples of good and bad `event_time` columns:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
✅ Good:

- `account_created_at` &mdash; This represents the specific time when an account was created, making it a fixed event in time.
- `session_began_at` &mdash; This captures the exact timestamp when a user session started, which won’t change and directly ties to the event.

❌ Bad:

- `_fivetran_synced` &mdash; This isn't the time that the event happened, it's the time that the event was ingested.
- `last_updated_at` &mdash; This isn't a good use case as this will keep changing over time.

`event_time` is required for [Incremental microbatch](/docs/build/incremental-microbatch) and [Advanced CI's compare changes](/docs/deploy/advanced-ci#speeding-up-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

When you configure `event_time`, it enables compare changes to:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

- Compare data in CI versus production for overlapping times only, reducing false discrepancies.
- Handle scenarios where CI has "fresher" data than production, by using only the overlapping timeframe, allowing you to avoid incorrect row-count changes.
- Accounts for subset data builds in CI without flagging filtered-out rows as "deleted" when compared with production.

## Examples

<Tabs>

<TabItem value="model" label="Models">

Here's an example in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
models:
my_project:
user_sessions:
+event_time: session_start_time
```
</File>

Example in a properties YAML file:

<File name='models/properties.yml'>

```yml
models:
- name: user_sessions
config:
event_time: session_start_time
```

</File>

Example in sql model config block:

<File name="models/user_sessions.sql">

```sql
{{ config(
event_time='session_start_time'
) }}
```

</File>

This setup sets `session_start_time` as the `event_time` for the `user_sessions` model. This makes sure the compare changes process uses this timestamp for time-slice comparisons or incremental microbatching.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
</TabItem>

<TabItem value="seeds" label="Seeds">

Here's an example in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
seeds:
my_project:
my_seed:
+event_time: record_timestamp
```

</File>

Example in a seed properties YAML:

<File name='seeds/properties.yml'>

```yml
seeds:
- name: my_seed
config:
event_time: record_timestamp
```
</File>

This setup sets `record_timestamp` as the `event_time` for `my_seed`. It ensures that the `record_timestamp` is used consistently in [Advanced CI's compare changes](/docs/deploy/advanced-ci#speeding-up-comparisons) or [incremental microbatching](/docs/build/incremental-microbatch).

</TabItem>
<TabItem value="sources" label="Sources">

Here's an example of source properties YAML file:

<File name='models/properties.yml'>

```yml
sources:
- name: source_name
tables:
- name: table_name
config:
event_time: event_timestamp
```
</File>

This setup sets `event_timestamp` as the `event_time` for the specified source table.

</TabItem>
</Tabs>
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -926,6 +926,7 @@ const sidebarSettings = {
"reference/resource-configs/alias",
"reference/resource-configs/database",
"reference/resource-configs/enabled",
"reference/resource-configs/event-time",
"reference/resource-configs/full_refresh",
"reference/resource-configs/contract",
"reference/resource-configs/grants",
Expand Down
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading