Skip to content

NR-347971 | OpenLineage event consumer writer #1959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 49 commits into
base: main
Choose a base branch
from

Conversation

devendra-nr
Copy link
Contributor

Relevant information

Added domain and entities for Data Pipeline Observability product

Checklist

I've read the guidelines and understand the acceptance criteria.
The value of the attribute marked as identifier will be unique and valid.
I've confirmed that my entity type wasn't already defined. If it is I'm providing an explanation above.

NR-347971 | Added conditions
@devendra-nr devendra-nr changed the title NR-347971 olin event consumer writer NR-347971 | OpenLineage event consumer writer Mar 11, 2025
github-actions[bot]
github-actions bot previously approved these changes Mar 11, 2025
Copy link
Contributor

@naxhh naxhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly concerned about the updatedAt as part of the identifier.

And that we are saying these are entities without TTL but using synthesis.

attributes:
- dataset.namespace
- dataset.name
- updatedAt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why updatedAt?

Wouldn't this create a new entity every time the field is updated? is that desired?
Can we talk about how many entities are we expecting here?

Copy link
Contributor Author

@devendra-nr devendra-nr Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of events received with updatedAt field are quite low ~10 per month per account.
This event is generated when schema of dataset changes and such events occur rarely but they are very important.
We want to track the changes as separate entity.

entityTagName: olin.dataset.updatedAt
multiValue: false
configuration:
entityExpirationTime: MANUAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entities that use synthesis can't be manual they must have a TTL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For our use case we want to store the history of changes to dataset schema for very long period 12+ months.
Also the number of entities generated are quite low ~10 per month per account.
So we want to handle the delete logic manually in our codebase.

@naxhh naxhh added the internal review Being reviewed by NR teams / Coordination needed label Mar 17, 2025
@naxhh
Copy link
Contributor

naxhh commented Mar 17, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal review Being reviewed by NR teams / Coordination needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants