-
Notifications
You must be signed in to change notification settings - Fork 63
Implementation[openhouseTableCommitEvents]: Commit job for freshness in TableStatsCollectionSparkApp #398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation[openhouseTableCommitEvents]: Commit job for freshness in TableStatsCollectionSparkApp #398
Conversation
Add IcebergCommitEventStats model for capturing commit events and sta…
…es with dedicated fields for long, string, and double types.
…t with metadata for commit events
…troduce CommitMetadata class for enhanced commit tracking
…r improved clarity; update documentation to reflect naming and relationship with commit events
… relationships for event tracking
…ats/model/BaseEventModels.java Co-authored-by: Stas Pak <[email protected]>
…ats/model/CommitEvent.java Co-authored-by: Stas Pak <[email protected]>
…ats/model/CommitMetadata.java Co-authored-by: Stas Pak <[email protected]>
…ats/model/CommitEventPartitions.java Co-authored-by: Stas Pak <[email protected]>
…ats/model/CommitEventPartitions.java Co-authored-by: Stas Pak <[email protected]>
…ats/model/PartitionStats.java Co-authored-by: Stas Pak <[email protected]>
…e inheritance from BaseDataset
…introduce interface for column statistics with specific implementations
- baseCommitEvent has-a CommitMetadata.
…ats/model/BaseEventModels.java Co-authored-by: Sumedh Sakdeo <[email protected]>
…ats/model/CommitEvent.java Co-authored-by: Sumedh Sakdeo <[email protected]>
…ats/model/CommitEvent.java Co-authored-by: Sumedh Sakdeo <[email protected]>
…ats/model/CommitEventTablePartitions.java Co-authored-by: Sumedh Sakdeo <[email protected]>
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/TableStatsCollectionSparkApp.java
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/util/TableStatsCollectorUtil.java
Outdated
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/util/TableStatsCollectorUtil.java
Outdated
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/Operations.java
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/util/TableStatsCollector.java
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/TableStatsCollectionSparkApp.java
Outdated
Show resolved
Hide resolved
…ddingCommitJobForFreshnessinStatsCollector
…ent utility method for database name extraction
…ynchronous execution method with timing and logging
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/TableStatsCollectionSparkApp.java
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/TableStatsCollectionSparkApp.java
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/util/TableStatsCollector.java
Show resolved
Hide resolved
As this is OSS codebase, can we remove the internal google doc link from the PR? |
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/Operations.java
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/TableStatsCollectionSparkApp.java
Outdated
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/TableStatsCollectionSparkApp.java
Outdated
Show resolved
Hide resolved
apps/spark/src/main/java/com/linkedin/openhouse/jobs/spark/Operations.java
Outdated
Show resolved
Hide resolved
… ArrayList on failure
Summary
I extended the existing TableStatsCollectionSparkApp to implement the logic for populating the openhouseTableCommitEvents table.
This new table will serve as the single source of truth for commit-related metadata across all OpenHouse datasets, including:
This enables a unified, consistent, and efficient way to access commit events for all OpenHouse tables.
Output / Result
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.