-
Notifications
You must be signed in to change notification settings - Fork 15
Embedding parents into models and snapshots #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embedding parents into models and snapshots #109
Conversation
jochemvandooren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work @ross-whatnot , thanks a lot!
I have left very small comments, the feature itself is 🥇 . Could you add a CHANGELOG entry?
Co-authored-by: Jochem van Dooren <[email protected]>
Co-authored-by: Jochem van Dooren <[email protected]>
…dbt-score into embedded-relatives
|
@jochemvandooren thanks! Think I've addressed everything. |
jochemvandooren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
sercancicek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @ross-whatnot! Thank you for your contribution 🙌 The PR already looks very well. I left one comment/suggestion.
| def _populate_parents(self) -> None: | ||
| """Populate `parents` for all models and snapshots.""" | ||
| for node in list(self.models.values()) + list(self.snapshots.values()): | ||
| for parent_id in node.depends_on.get("nodes", []): | ||
| if parent_id in self.models: | ||
| node.parents.append(self.models[parent_id]) | ||
| elif parent_id in self.snapshots: | ||
| node.parents.append(self.snapshots[parent_id]) | ||
| elif parent_id in self.sources: | ||
| node.parents.append(self.sources[parent_id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def _populate_parents(self) -> None: | |
| """Populate `parents` for all models and snapshots.""" | |
| for node in list(self.models.values()) + list(self.snapshots.values()): | |
| for parent_id in node.depends_on.get("nodes", []): | |
| if parent_id in self.models: | |
| node.parents.append(self.models[parent_id]) | |
| elif parent_id in self.snapshots: | |
| node.parents.append(self.snapshots[parent_id]) | |
| elif parent_id in self.sources: | |
| node.parents.append(self.sources[parent_id]) | |
| def _populate_parents(self) -> None: | |
| """Populate `parents` for all models and snapshots.""" | |
| all_parents = {**self.models, **self.snapshots, **self.sources} | |
| for node in list(self.models.values()) + list(self.snapshots.values()): | |
| for parent_id in node.depends_on.get("nodes", []): | |
| if parent := all_parents.get(parent_id): | |
| node.parents.append(parent) |
IMO, this looks easier to read and maintain but just a suggestion :)
Pretty curious about performances there: did you measure the delta? |
@matthieucan no significant difference in the base case; for a project with 3000+ models, a simple lint takes basically the same amount of time (pre-parsed manifest). I haven't done anything too complex with the history, but adding a test that compares attributes of a child to attributes of its parents (e.g. below) doesn't change that runtime in any meaningful way (~1.11s) |
|
@ross-whatnot Awesome, thanks for sharing! 🎉 |
|
Amazing work @ross-whatnot , thank you 🙌 |
|
I will release a new version once we merge #110, so you can benefit from this feature asap! |


Following discussion in this issue and related draft PRs, taking a simpler approach of adding
parents: list[Model | Source | Snapshot]toModelandSnapshotmodelsThis should allow writing rules that compare a node to its parents ("any model with tag
tier_1may only have parents that also have tagtier_1", "any model that does not have the tagbetamay not have any parents that do have the tagbeta"), to be able to make assertions about model lineage expectation. One could also traverse the graph in full (upstream) with a recursive rule that walks the graph viaparents.Tested with a ~3000 model manifest locally, and things seem to work just fine; will do a bit more poking and prodding, though.