Skip to content

Conversation

@Alex-Bair
Copy link
Member

@Alex-Bair Alex-Bair commented Nov 24, 2025

Description:

Updates to calculated properties in HubSpot do not cause a record's lastmodifieddate or updatedAt timestamp to change, meaning we cannot reliably incrementally capture updates to these properties. Instead, we can perform a "calculated properties refresh", which means periodically refreshing these calculated properties per some schedule & merging those refreshed properties into previously captured data. This strategy is very similar to the "formula field refresh" used in source-salesforce-native described in #2519.

In a calculated properties refresh, the connector will backfill all existing records per a binding's schedule. During these scheduled backfills, only calculated properties are captured, causing partial documents to be emitted & sent to the appropriate collection. The collections use merge reduction strategies, causing data from the partial documents to be merged into data from previously captured, complete documents. This causes materializations to materialize every property, not just the refreshed calculated ones, into the user's destination when a partial document is captured.

Snapshot changes are expected:

  • Capture snapshot now includes calculated properties.
  • Discover snapshot now includes the merge reduction strategies and the schedules.
  • Spec snapshot now includes the schedule field in resource configs.

Workflow steps:

My release plan for this change is to let auto-discovers roll out the merge reduction strategies to the appropriate existing collections. All existing captures will not have any schedules set for any of their bindings automatically. Existing captures will have to opt-in to calculated property refreshes by setting a schedule for CRM object bindings. All new captures will automatically have the default schedule 55 23 * * * set.

Documentation links affected:

The connector's documentation should be updated to reflect:

  • The new schedule field for bindings.
  • The calculated properties refresh strategy.

Notes for reviewers:

Tested on a local stack. Confirmed:

  • Calculated properties are included in captured documents.
  • Connectors initiate backfills per each binding's schedule.
  • During connector initiated backfills, only calculated properties are captured.
  • The merge reduction strategies cause data from partial documents to be merged into data from previously captured documents when materializing into a destination.

I recommend reviewing commit-by-commit.

Change collections for CRM objects to use a merge reduction strategy
instead of the default LWW strategy. The merge reduction strategy is used
at the top level, for the `properties` location, and the `propertiesWithHistory`
location. This ensures that property values and history captured in
calculated property refreshes are merged into the existing values from
previously captured documents. This is part of the "calculated properties
refresh" strategy for keeping calculated properties up to date.
The `schedule` in the resource config controls when bindings are
automatically backfilled.
Calculated properties do not cause the record's `lastmodifieddate` or
`updatedAt` timestamp to change, meaning we can't incrementally replicate
changes to calculated properties.

Instead, we'll rely on a "calculated properties refresh" that recaptures
only calculated properties per the resource's configured `schedule`.
Calculated properties in these partial documents will then be merged
into the complete documents captured outside of the "calculated properties"
refresh, and the entire document with all properties will be materialized
into the destination.
…ector initiated backfills

Instead of capturing all properties during scheduled backfills, only
capture calculated properties. This is part of the "calculated properties
refresh" strategy for capturing updates to calculated properties.
@Alex-Bair Alex-Bair force-pushed the bair/source-hubspot-native-calculated-properties-refresh branch from dbf33db to a463bfa Compare November 25, 2025 18:07
@Alex-Bair Alex-Bair linked an issue Nov 25, 2025 that may be closed by this pull request
@Alex-Bair Alex-Bair marked this pull request as ready for review November 25, 2025 18:23
@nicolaslazo nicolaslazo requested a review from Copilot November 25, 2025 19:49
Copilot finished reviewing on behalf of nicolaslazo November 25, 2025 19:52
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a calculated properties refresh strategy for the HubSpot connector, allowing periodic capture of HubSpot's calculated properties that don't trigger lastmodifieddate updates. The implementation follows a similar pattern to the Salesforce formula field refresh strategy.

Key Changes:

  • Introduces scheduled backfills that capture only calculated properties as partial documents
  • Implements merge reduction strategies to combine partial calculated property documents with complete documents
  • Adds a schedule field to resource configurations for controlling refresh frequency (defaults to "55 23 * * *" for new captures)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
source-hubspot-native/source_hubspot_native/models.py Adds HubspotResourceConfigWithSchedule class with schedule field; applies merge reduction strategies to properties and propertiesWithHistory fields to support partial document merging
source-hubspot-native/source_hubspot_native/resources.py Updates type references to use HubspotResourceConfigWithSchedule; sets default schedule and merge reduction strategy for CRM object resources
source-hubspot-native/source_hubspot_native/api.py Adds is_connector_initiated parameter to fetch_page_with_associations; filters to only calculated properties during connector-initiated backfills; includes all properties (including calculated) in incremental updates via _fetch_batch
source-hubspot-native/source_hubspot_native/init.py Updates all type parameters throughout the connector to use HubspotResourceConfigWithSchedule instead of ResourceConfig
source-hubspot-native/tests/snapshots/snapshots__spec__stdout.json Updates resource config schema to include the new schedule field with cron expression validation pattern
source-hubspot-native/tests/snapshots/snapshots__discover__stdout.json Adds default schedules to all CRM object bindings; includes merge reduction strategies in schemas at document root and for properties/propertiesWithHistory fields
source-hubspot-native/tests/snapshots/snapshots__capture__stdout.json Reflects captured calculated properties (e.g., days_to_close, hs_email_domain, hs_is_contact, etc.) that were previously filtered out

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Alex-Bair added a commit to estuary/flow that referenced this pull request Nov 25, 2025
Copy link
Contributor

@nicolaslazo nicolaslazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent PR Alex, thanks for all the commit messages. ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

source-hubspot-native: capture calculated properties

3 participants