Skip to content

Conversation

@minwoox
Copy link
Contributor

@minwoox minwoox commented Feb 23, 2022

Motivation:
Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history,
Central Dogma will eventually get in trouble managing disk usage.
We can handle this by maintaining the primary and secondary Git repositories internally.
This works in this way:

  • Commits are pushed to the primary Git repository.
  • If the number of commits exceeds the threshold (minRetentionCommits), then the secondary Git repository is created.
  • Commits are pushed to both primary and secondary Git repositories.
  • If the secondary Git repository has the number of commits more than the threshold;
    • The secondary Git repository is promoted to the primary Git repository.
    • The primary Git repository is removed completely.
    • Another secondary Git repository is created.
  • Back to the third.

Modifications:

  • Add CreateRollingRepositoryCommand that creates the rolling repository by the CommitRetentionManagementPlugin.
  • Add GitRepositoryV2 that manages the rolling jGit repositories.
    • The name of internal repositories will be: foo_0000000000, foo_0000000001, foo_0000000002 and so on
      • So that we can only store the suffix of the primary repo to maintain the repository. (i.e. We don't have to maintain the directory information of the secondary repo.)
      • RepositoryMetadataDatabase has the suffix in its file database.
    • GitRepository is not removed for the migration test.
  • Add InternalRepository that has jGit repository and CommitIdDatabase.
  • What happens if the revision of an operation(such as diff, watch, history, etc.) is lower than the firstRevision of the current primary repo?
    • If Revision.INIT(1) is used, the firstRevision is used instead.
      • e.g. diff(INIT, new Revision(100) ...) is equals to diff(new Revision(firstRevisionNumber), new Revision(100) ...)
    • If the Revision between Revision.INIT(1) and the firstRevision is used, a RevisionNotFoundException is raised.
      • except watch and findLatestRevision.

Result:

Todo:

  • Provide a way to set minRetentionCommits and minRetentionDay for each repository.
  • Unused internal repositories are removed by purging service.
  • Support mirroring from CD to external Git.

minwoox added 13 commits July 16, 2021 22:37
Motivation:
Central Dogma uses jGit to store data. Due to the nature of Git that stores unlimited history,
Central Dogma will eventually get in trouble managing disk usage.
We can handle this by maintaing the primary and secondary Git repository internally.
This works in this way:
1 Commits are pushed to the primary Git repository.
2 If the number of commits exceed the threshold (`minRetentionCommits`), then the secondary Git repository is created.
3 Commits are pushed to the both primary and secondary Git repositories.
4 If the secondary Git repository has the number of commits more than the threshold;
  - The secondary Git repository is promoted to the primary Git repository.
  - The primary Git repository is removed completely.
  - Another secondary Git repository is created.
5 Back to 3.

Modifications:
- TBD

Result:
- Close 575
- TBD

Todo:
- Provide a way to set `minRetentionCommits` and `minRetentionDay` for each repository.
- Support mirroring from CD to external Git.
@codecov
Copy link

codecov bot commented Feb 23, 2022

Codecov Report

Attention: Patch coverage is 70.19763% with 377 lines in your changes missing coverage. Please review.

Project coverage is 63.53%. Comparing base (079daeb) to head (84cb6cd).
Report is 211 commits behind head on main.

Files with missing lines Patch % Lines
...nal/storage/repository/git/InternalRepository.java 75.08% 99 Missing and 43 partials ⚠️
...ternal/storage/repository/git/GitRepositoryV2.java 80.58% 50 Missing and 29 partials ⚠️
...epository/git/CommitRetentionManagementPlugin.java 4.10% 69 Missing and 1 partial ⚠️
...server/command/CreateRollingRepositoryCommand.java 0.00% 16 Missing ⚠️
...age/repository/git/RepositoryMetadataDatabase.java 77.46% 10 Missing and 6 partials ⚠️
...orp/centraldogma/server/CommitRetentionConfig.java 0.00% 15 Missing ⚠️
...ernal/storage/repository/git/CommitIdDatabase.java 83.33% 8 Missing and 3 partials ⚠️
.../linecorp/centraldogma/server/command/Command.java 0.00% 6 Missing ⚠️
...ogma/server/command/StandaloneCommandExecutor.java 0.00% 4 Missing and 1 partial ⚠️
...al/storage/repository/git/LegacyGitRepository.java 28.57% 5 Missing ⚠️
... and 4 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #681      +/-   ##
============================================
- Coverage     65.69%   63.53%   -2.17%     
- Complexity     3351     3473     +122     
============================================
  Files           358      366       +8     
  Lines         13893    15093    +1200     
  Branches       1496     1687     +191     
============================================
+ Hits           9127     9589     +462     
- Misses         3916     4591     +675     
- Partials        850      913      +63     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@trustin
Copy link
Member

trustin commented Mar 8, 2023

If Revision.INIT(1) is used, the firstRevision is used instead.

This somewhat sounds like a surprising behavior to me. This means a diff command that once worked as expected starts to return completely different result once rolling occurs.

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

@aidanbae
Copy link

This issue seems to be a necessary feature to keep the cluster running reliably.

@minwoox
Copy link
Contributor Author

minwoox commented Aug 24, 2023

@aidanbae Sorry for making you wait for this. I'm working on it now. Please stay tuned.

@minwoox
Copy link
Contributor Author

minwoox commented Aug 25, 2023

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

That makes sense. Let me apply that approach.

minwoox added a commit to minwoox/centraldogma that referenced this pull request Aug 28, 2023
Motivation:
Before we apply line#681 that removes old commits,
it would be nice if we can find any usages that access data with the INIT Revision or
revisions that are more than 5000 (default minimum number of commit retentions) commits ahead from the head.

Modifications:
- Add a counter that is incresed when INIT Revision or revisions that are more than 5000 ahead from the head are used.

Result:
- Track the number of usages.
- This commit will be reverted after we find the usage and fix it.
@minwoox
Copy link
Contributor Author

minwoox commented Aug 29, 2023

Can we instead treat any revision between INIT (inclusive) and firstRevision (exclusive) as an 'empty' state where the repository contains no files at all?

Had a chat with the team, and we decided to raise an exception instead of treating it as an empty state.

This PR is ready, so PTAL. 😉

minwoox added a commit that referenced this pull request Sep 4, 2023
Motivation:
Before we apply #681 that removes old commits, it would be nice if we could find any usages that access data with the INIT Revision or revisions that are more than 5000 (default minimum number of commit retentions) commits ahead from the head.

Modifications:
- Add a counter that is increased when INIT Revision or revisions that are more than 5000 ahead from the head are used.

Result:
- Track the number of usages.
- This commit will be reverted after we find the usage and fix it.
Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! Left some minor questions 🙇

requireNonNull(initialRevision, "initialRevision");
checkArgument(minRetentionCommits > 0, "minRetentionCommits: %s (expected: > 0)",
minRetentionCommits);
checkArgument(minRetentionDays > 0, "minRetentionDays: %s (expected: > 0)", minRetentionDays);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) isn't it possible for minRetentionDays to be 0?

Suggested change
checkArgument(minRetentionDays > 0, "minRetentionDays: %s (expected: > 0)", minRetentionDays);
checkArgument(minRetentionDays >= 0, "minRetentionDays: %s (expected: >= 0)", minRetentionDays);

this.parent = requireNonNull(parent, "parent");
originalRepoName = requireNonNull(repositoryDir, "repositoryDir").getName();
this.repositoryWorker = requireNonNull(repositoryWorker, "repositoryWorker");
requireNonNull(author, "author");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can remove the assignment at L171

Suggested change
requireNonNull(author, "author");
this.author = requireNonNull(author, "author");

}
}

private int applyChanges(@Nullable Revision baseRevision, @Nullable ObjectId baseTreeId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would honestly be very helpful if these kind of methods were deduped instead of copy-pasted so that we don't have to review them again 😅

Anyways, let me just assume that these methods were copy-pasted since I don't think it's realistic to invest more time into looking in detail through this file

Comment on lines +253 to +255
if (!primaryRepoDir.mkdirs()) {
throw new StorageException("failed to create " + primaryRepoDir + " while migrating to V2.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be safer to do this before L249 so that we run less of a risk. (If this step fails, then repoDir doesn't exist and only tmpRepoDir exists at this point)

continue;
}

final boolean fetchContent = FindOption.FETCH_CONTENT.get(options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can compute this once at L441

this.headRevision = res.revision;
final InternalRepository secondaryRepo = this.secondaryRepo;
if (secondaryRepo != null) {
assert headRevision.equals(secondaryRepo.headRevision());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) Can this be guaranteed?

For instance, if a CentralDogma server is shut down forcefully due to a jvm crash before the secondaryRepo.commit is invoked, will the server be recoverable? or do we need to manually delete the secondaryRepo folder?

This scenario could also be possible if secondaryRepo == null while a commit is occurring. (due to secondary repo rolling)

Rather, would a mechanism which catches up to the primary repo be a better idea?

assert headRevision.equals(secondaryRepo.headRevision());

// Push the same commit to the secondary repo.
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a StorageException is thrown for whatever reason (e.g. insufficient disk), should the overall call to the commit also fail?

Suggested change
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis,
try {
secondaryRepo.commit(headRevision, headRevision.forward(1), commitTimeMillis,
} (catch ...) {}

Comment on lines +158 to +159
primarySuffix = newPrimarySuffix;
writeSuffix(newPrimarySuffix);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should update the suffix in case writeSuffix fails (primaryRepoDir will return the wrong value)

Suggested change
primarySuffix = newPrimarySuffix;
writeSuffix(newPrimarySuffix);
writeSuffix(newPrimarySuffix);
primarySuffix = newPrimarySuffix;

}

@Override
public void createRollingRepository(Revision rollingRepositoryInitialRevision,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) I'm assuming that we don't have a way to synchronize two CreateRollingRepositoryCommand being issued on a single repository simultaneously. (and judging from the current implementation I don't think it should be allowed)

Do you think this scenario is impossible? or do we introduce a separate lock for this command?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide a way to remove the old commits

5 participants