Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add event_time page #6383

Open
wants to merge 35 commits into
base: current
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
46231b0
add event_time page
mirnawong1 Oct 30, 2024
33a66a8
Merge branch 'current' into add-event-time
mirnawong1 Oct 30, 2024
4f2c6dc
add img and rn
mirnawong1 Oct 30, 2024
57ee608
fix link
mirnawong1 Oct 30, 2024
3354c9d
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Oct 30, 2024
603c21c
fix link again
mirnawong1 Oct 30, 2024
488460c
Update event-time.md
mirnawong1 Oct 30, 2024
2fb62c5
Update release-notes.md
mirnawong1 Oct 30, 2024
69ba339
Update event-time.md
mirnawong1 Oct 30, 2024
1ebbbdb
Update advanced-ci.md
mirnawong1 Oct 30, 2024
2b713ee
Update advanced-ci.md
mirnawong1 Oct 30, 2024
c789601
Update advanced-ci.md
mirnawong1 Oct 30, 2024
5708119
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 4, 2024
903c5d1
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
2dd873a
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 4, 2024
b7a07be
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 4, 2024
016c555
Update event-time.md
mirnawong1 Nov 4, 2024
2910914
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
79128fe
Merge branch 'current' into add-event-time
mirnawong1 Nov 4, 2024
551821d
update img
mirnawong1 Nov 6, 2024
735ae38
fix img size
mirnawong1 Nov 6, 2024
ac7616b
Merge branch 'current' into add-event-time
mirnawong1 Nov 6, 2024
0363051
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
d693c9b
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
809f2a7
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
81e2318
Update website/docs/reference/resource-configs/event-time.md
mirnawong1 Nov 6, 2024
14632b3
Merge branch 'current' into add-event-time
mirnawong1 Nov 6, 2024
2b98454
Merge branch 'current' into add-event-time
mirnawong1 Nov 11, 2024
3da521f
Update website/docs/docs/deploy/advanced-ci.md
mirnawong1 Nov 11, 2024
edd1123
add scenarios
mirnawong1 Nov 11, 2024
52c0db9
add scenarios
mirnawong1 Nov 11, 2024
f461ffa
fold in grace's feedback
mirnawong1 Nov 11, 2024
a4f3b23
Merge branch 'current' into add-event-time
mirnawong1 Nov 11, 2024
a1c8166
Merge branch 'add-event-time' of github.com:dbt-labs/docs.getdbt.com …
mirnawong1 Nov 11, 2024
f1969f4
remove redundant
mirnawong1 Nov 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions website/docs/docs/build/incremental-microbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Refer to [Supported incremental strategies by adapter](/docs/build/incremental-s

Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations.

Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure.
Microbatch incremental models make it possible to process transformations on very large time-series datasets with efficiency and resiliency. When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the [`event_time`](/reference/resource-configs/event-time) and `batch_size` you configure.

Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and <Term id="idempotent" />. This is a powerful abstraction that makes it possible for dbt to run batches separately — in the future, concurrently — and to retry them independently.

Expand Down Expand Up @@ -162,7 +162,7 @@ Several configurations are relevant to microbatch models, and some are required:

| Config | Type | Description | Default |
|----------|------|---------------|---------|
| `event_time` | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| [`event_time`](/reference/resource-configs/event-time) | Column (required) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A |
| `begin` | Date (required) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A |
| `batch_size` | String (required) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A |
| `lookback` | Integer (optional) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` |
Expand Down
1 change: 1 addition & 0 deletions website/docs/docs/dbt-versions/release-notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo

## October 2024

- **New**: Use the `event_time` configuration to specify when an event occurred. This configuration is required for [Incremental microbatch](/docs/build/incremental-microbatch) and can be added to ensure you're comparing overlapping times in [Advanced CI's compare changes](/docs/deploy/advanced-ci). Available in dbt Cloud Versionless and dbt Core v1.9 and higher. Refer to [event_time](/reference/resource-configs/event-time) for more information.
- **Fix:** Previously, POST requests to the Jobs API with invalid `cron` strings would return HTTP response status code 500s but would update the underlying entity. Now, POST requests to the Jobs API with invalid `cron` strings will result in status code 400s, without the underlying entity being updated.
- **Fix:** Fixed an issue where the `Source` view page in dbt Explorer did not correctly display source freshness status if older than 30 days.
- **Fix:** The UI now indicates when the description of a model is inherited from a catalog comment.
Expand Down
9 changes: 9 additions & 0 deletions website/docs/docs/deploy/advanced-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,15 @@ dbt reports the comparison differences in:

<Lightbox src="/img/docs/dbt-cloud/example-ci-compare-changes-tab.png" width="85%" title="Example of the Compare tab" />

### Considerations
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
It's common for CI jobs to only [build a subset of data](/best-practices/best-practice-workflows#limit-the-data-processed-when-in-development), for example only the last 7 days of data. When an [`event_time`](/reference/resource-configs/event-time) column is specified on your model, compare changes can:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

- Compare data in CI against production for only the overlapping times, avoiding false positives and returning results faster.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both of these bullets have the same benefit of "using only the overlapping timeframe, which avoids incorrect row-count changes and returns results faster"

I would distinguish the 2 scenarios as:

  • scenarios where your CI job only builds a subset of data
  • scenarios where your CI job contains fresher data than production

Rather than nesting the second scenario within the first - lmk if that makes sense!

Copy link
Contributor Author

@mirnawong1 mirnawong1 Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to this;

It's common for CI jobs to only build a subset of data (for example only the last 7 days of data).

When an event_time column is specified on your model, compare changes can optimize comparisons by using only the overlapping timeframe (meaning the timeframe exists in both the CI and production environment), helping you avoid incorrect row-count changes to return results faster.

This is useful in scenarios like:

  • Subset of data in CI — When CI builds only a subset of data (like the most recent 7 days), compare changes might interpret the excluded data as "deleted rows." Configuring event_time allows you to avoid this issue by limiting comparisons to the overlapping timeframe, preventing false alerts about data deletions that are just filtered out in CI.
  • Fresher data in CI than in production — When your CI job includes fresher data than production, compare changes might flag the additional rows as "new" data, even though they’re just fresher data in CI. With event_time configured, the comparison only includes the shared timeframe and correctly reflects actual changes in the data.

- Handle scenarios where CI contains fresher data than production by using only the overlapping timeframe, which avoids incorrect row-count changes.
- Coming soon, you'll be able to add a flag to the command list allowing you to select the specific time slice to compare.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

<Lightbox src="/img/docs/deploy/apples_to_apples.png" title="event_time ensures the same time-slice of data is accurately compared between your CI and production environments." />

mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
## About the cached data

After [comparing changes](#compare-changes), dbt Cloud stores a cache of no more than 100 records for each modified model for preview purposes. By caching this data, you can view the examples of changed data without rerunning the comparison against the data warehouse every time (optimizing for lower compute costs). To display the changes, dbt Cloud uses a cached version of a sample of the data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on) that's set in the CI job's environment.
Expand Down
247 changes: 247 additions & 0 deletions website/docs/reference/resource-configs/event-time.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
---
title: "event_time"
id: "event-time"
sidebar_label: "event_time"
resource_types: [models, seeds, source]
description: "dbt uses event_time to understand when an event occurred. When defined, event_time enables microbatch incremental models and more refined comparison of datasets during Advanced CI."
datatype: string
---

Available in dbt Cloud Versionless and dbt Core v1.9 and higher.

<Tabs>
<TabItem value="model" label="Models">

<File name='dbt_project.yml'>

```yml
models:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>


<File name='models/properties.yml'>

```yml
models:
- name: model_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```
</File>

<File name="models/modelname.sql">

```sql
{{ config(
event_time='my_time_field'
) }}
```

</File>

</TabItem>

<TabItem value="seeds" label="Seeds">

<File name='dbt_project.yml'>

```yml
seeds:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<File name='seeds/properties.yml'>

```yml
seeds:
- name: seed_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```

</File>
</TabItem>

<TabItem value="snapshot" label="Snapshots">

<File name='dbt_project.yml'>

```yml
snapshots:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<VersionBlock firstVersion="1.9">
<File name='snapshots/properties.yml'>

```yml
snapshots:
- name: snapshot_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```
</File>
</VersionBlock>

<VersionBlock lastVersion="1.8">

<File name="models/modlename.sql">

```sql

{{ config(
event_time: 'my_time_field'
) }}
```

</File>


import SnapshotYaml from '/snippets/_snapshot-yaml-spec.md';

<SnapshotYaml/>
</VersionBlock>



</TabItem>

<TabItem value="sources" label="Sources">

<File name='dbt_project.yml'>

```yml
sources:
[resource-path:](/reference/resource-configs/resource-path)
+event_time: my_time_field
```
</File>

<File name='models/properties.yml'>

```yml
sources:
- name: source_name
[config](/reference/resource-properties/config):
event_time: my_time_field
```

</File>
</TabItem>
</Tabs>

## Definition

Set the `event_time` to the name of the field that represents the timestamp of the event, as opposed to a date like data loading date. You can configure `event_time` for a [model](/docs/build/models), [seed](/docs/build/seeds), or [source](/docs/build/sources) in your `dbt_project.yml` file, property YAML file, or config block.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

`event_time` is required for [Incremental microbatch](/docs/build/incremental-microbatch) and [Advanced CI's compare changes](/docs/deploy/advanced-ci#considerations) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

When you configure `event_time`, it enables compare changes to:
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

- Compare data in CI versus production for overlapping times only, reducing false discrepancies.
- Handle scenarios where CI has "fresher" data than production by using only the overlapping timeframe, allowing you to avoid incorrect row-count changes.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
- Account for subset data builds in CI without flagging filtered-out rows as "deleted" when compared with production.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

## Examples

<Tabs>

<TabItem value="model" label="Models">

Here's an example in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
models:
my_project:
user_sessions:
+event_time: session_start_time
```
</File>

Example in a properties YAML file:

<File name='models/properties.yml'>

```yml
models:
- name: user_sessions
config:
event_time: session_start_time
```

</File>

Example in sql model config block:

<File name="models/user_sessions.sql">

```sql
{{ config(
event_time='session_start_time'
) }}
```

</File>

This setup sets `session_start_time` as the `event_time` for the `user_sessions` model, which makes sure the compare changes process uses this timestamp for time-slice comparisons or incremental microbatching.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
</TabItem>

<TabItem value="seeds" label="Seeds">

Here's an example in the `dbt_project.yml` file:

<File name='dbt_project.yml'>

```yml
seeds:
my_project:
my_seed:
+event_time: record_timestamp
```

</File>

Example in a seed properties YAML:

<File name='seeds/properties.yml'>

```yml
seeds:
- name: my_seed
config:
event_time: record_timestamp
```
</File>

This setup sets `record_timestamp` as the `event_time` for `my_seed`. This ensures that the `record_timestamp` is used consistently for compare changes processes or incremental microbatching.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

</TabItem>
<TabItem value="sources" label="Sources">

Here's an example of source properties YAML file:

<File name='models/properties.yml'>

```yml
sources:
- name: source_name
tables:
- name: table_name
config:
event_time: event_timestamp
```
</File>

This setup sets `event_timestamp` as the `event_time` for the specified source table.

</TabItem>
</Tabs>
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -926,6 +926,7 @@ const sidebarSettings = {
"reference/resource-configs/alias",
"reference/resource-configs/database",
"reference/resource-configs/enabled",
"reference/resource-configs/event-time",
"reference/resource-configs/full_refresh",
"reference/resource-configs/contract",
"reference/resource-configs/grants",
Expand Down
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading