Split up job data dimension #1050

jjnesbitt · 2025-01-31T21:02:01Z

This PR accomplishes the following:

Split up JobDataDimension into the following dimensions:
- SpackJobDataDimension
- GitlabJobDataDimension
- JobResultDimension
- JobRetryDimension
This data was all bundled into the JobDataDimension, which necessitated an entry in that dimensional table for every job. This meant that the JobDataDimension table was as large as the JobFact table, which is undesirable, and results in poor join performance. With this dimension split up, each table is much smaller, which results in more re-used rows, as well as faster joins.
The storage of gitlab section timers on the JobFact table. Since this is numeric, and will have aggregations performed on it, it's the correct place to store it.
Setting the job_id as the primary key for the JobFact table. This is really how we expect these facts to exist anyway (only one job fact per job). As a result of the old primary key for this table, there were actually several duplicate job entries, which this PR cleans up.
The removal of the JobDataDimension table. This is due to all of its constituent data being split into smaller dimension tables. However, this also results in the removal of that foreign key from the TimerFact and TimerPhaseFact tables, although they still store the job_id. I plan to evaluate if and what new relations should be introduced, and make those changes in the future.

I have run these migrations locally and can confirm there are no errors that arise. I also have tested some webhook payloads and it seems the job processor runs without error. The migration downtime for this should be ~6 hours in production.

mvandenburgh

Haven't tested this yet, but it makes sense to me from a design perspective 👍

analytics/analytics/core/models/dimensions.py

mvandenburgh · 2025-05-14T18:29:14Z

Also, we'll want to be careful about how we deploy this since it will mess up all our metabase dashboards. I think we can update all the dashboards in metabase locally and apply them after this is deployed.

Actually, I'll dismiss this for now so it doesn't get merged accidentally since you mentioned you still have to review migrations + we should prep the metabase dashboards beforehand

jjnesbitt requested a review from mvandenburgh January 31, 2025 21:02

jjnesbitt added 4 commits February 3, 2025 12:03

Add tqdm as a dependency

620908c

Split up job data model

8372bd8

Add failure_reason and exit_code

0c8434b

Bump django image version to 0.5.0

6aca90c

jjnesbitt force-pushed the split-up-job-data-dimension branch from 7529776 to 6aca90c Compare February 3, 2025 17:04

Merge branch 'main' into split-up-job-data-dimension

8e1e873

mvandenburgh requested review from mvandenburgh and removed request for mvandenburgh May 14, 2025 16:12

mvandenburgh previously approved these changes May 14, 2025

View reviewed changes

analytics/analytics/core/models/dimensions.py Show resolved Hide resolved

jjnesbitt added 3 commits May 14, 2025 16:39

Merge branch 'main' into split-up-job-data-dimension

e41daa0

Move job_type to spack job data dimension

48a0e76

Merge branch 'main' into split-up-job-data-dimension

be3c6f5

mvandenburgh approved these changes May 27, 2025

View reviewed changes

jjnesbitt merged commit bcf1f7a into main May 27, 2025
17 checks passed

jjnesbitt deleted the split-up-job-data-dimension branch May 27, 2025 18:20

jjnesbitt mentioned this pull request May 29, 2025

Fix lingering job_type field in gitlab job data dimension #1133

Merged

jjnesbitt mentioned this pull request Jun 5, 2025

Add clear_worktree gitlab timer section #1139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split up job data dimension #1050

Split up job data dimension #1050

Uh oh!

jjnesbitt commented Jan 31, 2025

Uh oh!

mvandenburgh left a comment

Uh oh!

Uh oh!

mvandenburgh commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Split up job data dimension #1050

Split up job data dimension #1050

Uh oh!

Conversation

jjnesbitt commented Jan 31, 2025

Uh oh!

mvandenburgh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mvandenburgh commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!