Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR accomplishes the following:
Split up
JobDataDimension
into the following dimensions:SpackJobDataDimension
GitlabJobDataDimension
JobResultDimension
JobRetryDimension
This data was all bundled into the
JobDataDimension
, which necessitated an entry in that dimensional table for every job. This meant that theJobDataDimension
table was as large as theJobFact
table, which is undesirable, and results in poor join performance. With this dimension split up, each table is much smaller, which results in more re-used rows, as well as faster joins.The storage of gitlab section timers on the
JobFact
table. Since this is numeric, and will have aggregations performed on it, it's the correct place to store it.Setting the
job_id
as the primary key for theJobFact
table. This is really how we expect these facts to exist anyway (only one job fact per job). As a result of the old primary key for this table, there were actually several duplicate job entries, which this PR cleans up.The removal of the
JobDataDimension
table. This is due to all of its constituent data being split into smaller dimension tables. However, this also results in the removal of that foreign key from theTimerFact
andTimerPhaseFact
tables, although they still store thejob_id
. I plan to evaluate if and what new relations should be introduced, and make those changes in the future.I have run these migrations locally and can confirm there are no errors that arise. I also have tested some webhook payloads and it seems the job processor runs without error. The migration downtime for this should be ~6 hours in production.