Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up job data dimension #1050

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

jjnesbitt
Copy link
Collaborator

This PR accomplishes the following:

  1. Split up JobDataDimension into the following dimensions:

    • SpackJobDataDimension
    • GitlabJobDataDimension
    • JobResultDimension
    • JobRetryDimension

    This data was all bundled into the JobDataDimension, which necessitated an entry in that dimensional table for every job. This meant that the JobDataDimension table was as large as the JobFact table, which is undesirable, and results in poor join performance. With this dimension split up, each table is much smaller, which results in more re-used rows, as well as faster joins.

  2. The storage of gitlab section timers on the JobFact table. Since this is numeric, and will have aggregations performed on it, it's the correct place to store it.

  3. Setting the job_id as the primary key for the JobFact table. This is really how we expect these facts to exist anyway (only one job fact per job). As a result of the old primary key for this table, there were actually several duplicate job entries, which this PR cleans up.

  4. The removal of the JobDataDimension table. This is due to all of its constituent data being split into smaller dimension tables. However, this also results in the removal of that foreign key from the TimerFact and TimerPhaseFact tables, although they still store the job_id. I plan to evaluate if and what new relations should be introduced, and make those changes in the future.

I have run these migrations locally and can confirm there are no errors that arise. I also have tested some webhook payloads and it seems the job processor runs without error. The migration downtime for this should be ~6 hours in production.

@jjnesbitt jjnesbitt force-pushed the split-up-job-data-dimension branch from 7529776 to 6aca90c Compare February 3, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant