-
Notifications
You must be signed in to change notification settings - Fork 26
source-hubspot-native: calculated properties refresh #3528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
source-hubspot-native: calculated properties refresh #3528
Conversation
Change collections for CRM objects to use a merge reduction strategy instead of the default LWW strategy. The merge reduction strategy is used at the top level, for the `properties` location, and the `propertiesWithHistory` location. This ensures that property values and history captured in calculated property refreshes are merged into the existing values from previously captured documents. This is part of the "calculated properties refresh" strategy for keeping calculated properties up to date.
The `schedule` in the resource config controls when bindings are automatically backfilled.
Calculated properties do not cause the record's `lastmodifieddate` or `updatedAt` timestamp to change, meaning we can't incrementally replicate changes to calculated properties. Instead, we'll rely on a "calculated properties refresh" that recaptures only calculated properties per the resource's configured `schedule`. Calculated properties in these partial documents will then be merged into the complete documents captured outside of the "calculated properties" refresh, and the entire document with all properties will be materialized into the destination.
…ector initiated backfills Instead of capturing all properties during scheduled backfills, only capture calculated properties. This is part of the "calculated properties refresh" strategy for capturing updates to calculated properties.
dbf33db to
a463bfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a calculated properties refresh strategy for the HubSpot connector, allowing periodic capture of HubSpot's calculated properties that don't trigger lastmodifieddate updates. The implementation follows a similar pattern to the Salesforce formula field refresh strategy.
Key Changes:
- Introduces scheduled backfills that capture only calculated properties as partial documents
- Implements merge reduction strategies to combine partial calculated property documents with complete documents
- Adds a
schedulefield to resource configurations for controlling refresh frequency (defaults to "55 23 * * *" for new captures)
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| source-hubspot-native/source_hubspot_native/models.py | Adds HubspotResourceConfigWithSchedule class with schedule field; applies merge reduction strategies to properties and propertiesWithHistory fields to support partial document merging |
| source-hubspot-native/source_hubspot_native/resources.py | Updates type references to use HubspotResourceConfigWithSchedule; sets default schedule and merge reduction strategy for CRM object resources |
| source-hubspot-native/source_hubspot_native/api.py | Adds is_connector_initiated parameter to fetch_page_with_associations; filters to only calculated properties during connector-initiated backfills; includes all properties (including calculated) in incremental updates via _fetch_batch |
| source-hubspot-native/source_hubspot_native/init.py | Updates all type parameters throughout the connector to use HubspotResourceConfigWithSchedule instead of ResourceConfig |
| source-hubspot-native/tests/snapshots/snapshots__spec__stdout.json | Updates resource config schema to include the new schedule field with cron expression validation pattern |
| source-hubspot-native/tests/snapshots/snapshots__discover__stdout.json | Adds default schedules to all CRM object bindings; includes merge reduction strategies in schemas at document root and for properties/propertiesWithHistory fields |
| source-hubspot-native/tests/snapshots/snapshots__capture__stdout.json | Reflects captured calculated properties (e.g., days_to_close, hs_email_domain, hs_is_contact, etc.) that were previously filtered out |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Documentation updates for estuary/connectors#3528.
nicolaslazo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent PR Alex, thanks for all the commit messages. ✅
Documentation updates for estuary/connectors#3528.
Description:
Updates to calculated properties in HubSpot do not cause a record's
lastmodifieddateorupdatedAttimestamp to change, meaning we cannot reliably incrementally capture updates to these properties. Instead, we can perform a "calculated properties refresh", which means periodically refreshing these calculated properties per some schedule & merging those refreshed properties into previously captured data. This strategy is very similar to the "formula field refresh" used insource-salesforce-nativedescribed in #2519.In a calculated properties refresh, the connector will backfill all existing records per a binding's
schedule. During these scheduled backfills, only calculated properties are captured, causing partial documents to be emitted & sent to the appropriate collection. The collections usemergereduction strategies, causing data from the partial documents to be merged into data from previously captured, complete documents. This causes materializations to materialize every property, not just the refreshed calculated ones, into the user's destination when a partial document is captured.Snapshot changes are expected:
mergereduction strategies and theschedules.schedulefield in resource configs.Workflow steps:
My release plan for this change is to let auto-discovers roll out the
mergereduction strategies to the appropriate existing collections. All existing captures will not have anyschedules set for any of their bindings automatically. Existing captures will have to opt-in to calculated property refreshes by setting aschedulefor CRM object bindings. All new captures will automatically have the default schedule55 23 * * *set.Documentation links affected:
The connector's documentation should be updated to reflect:
schedulefield for bindings.Notes for reviewers:
Tested on a local stack. Confirmed:
schedule.mergereduction strategies cause data from partial documents to be merged into data from previously captured documents when materializing into a destination.I recommend reviewing commit-by-commit.