Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for shallow cloning repos #363

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

MeNsaaH
Copy link
Collaborator

@MeNsaaH MeNsaaH commented Feb 7, 2025

This adds support for shallow cloning repos.

With every hook/request, kubechecks clone the full repo. This can become very expensive for large repositories.

Shallow cloning allows us to limit the cloning to just the commits we need.

@MeNsaaH MeNsaaH requested review from djeebus and Greyeye February 7, 2025 01:27
@MeNsaaH MeNsaaH force-pushed the enable-shallow-cloning branch from ab10be3 to 0837db6 Compare February 7, 2025 01:32
Copy link

github-actions bot commented Feb 7, 2025

Temporary image available at ghcr.io/zapier/kubechecks:0.0.0-pr363.

Signed-off-by: Mmadu Manasseh <[email protected]>
Signed-off-by: Mmadu Manasseh <[email protected]>
VcsUploadUrl string `mapstructure:"vcs-upload-url"` // github enterprise upload URL
VcsToken string `mapstructure:"vcs-token"`
VcsType string `mapstructure:"vcs-type"`
EnableShallowClone bool `mapstructure:"enable-shallow-clone"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should prefix the key with vcs- to make it clear what this configures.

Copy link
Collaborator Author

@MeNsaaH MeNsaaH Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djeebus I've used the RepoShallowClone for this instead, what do you think?

I feel Vcs prefixed variables are for vcs configs and this is not "completely" one of them.

@Greyeye
Copy link
Collaborator

Greyeye commented Feb 7, 2025

@MeNsaaH we got caught by dependencies missing several times (e.g. had to implement a kustomize walker)
I am not sure shallow clone is a good idea.

@MeNsaaH
Copy link
Collaborator Author

MeNsaaH commented Feb 7, 2025

@MeNsaaH we got caught by dependencies missing several times (e.g. had to implement a kustomize walker)
I am not sure shallow clone is a good idea.

@Greyeye can you please elaborate how shallow cloning will cause missing dependencies

@MeNsaaH MeNsaaH force-pushed the enable-shallow-cloning branch from a52748f to 2c06ea0 Compare February 7, 2025 22:14
@MeNsaaH MeNsaaH force-pushed the enable-shallow-cloning branch from 2c06ea0 to acd97a9 Compare February 7, 2025 22:41
@MeNsaaH MeNsaaH requested a review from djeebus February 10, 2025 11:41
pkg/git/repo.go Outdated Show resolved Hide resolved
Signed-off-by: Mmadu Manasseh <[email protected]>
@MeNsaaH MeNsaaH force-pushed the enable-shallow-cloning branch from a80d3ef to 012a083 Compare February 10, 2025 23:46
@zapier-sre-bot
Copy link
Collaborator

Mergecat's Review

Click to read mergecats review!

😼 Mergecat review of pkg/utils.go

@@ -16,6 +16,7 @@ func Pointer[T interface{}](item T) *T {
 }
 
 func WipeDir(dir string) {
+	log.Debug().Str("path", dir).Msg("wiping path")
 	if err := os.RemoveAll(dir); err != nil {
 		log.Error().
 			Err(err).

Feedback & Suggestions:

  1. Log Level Appropriateness: The addition of a debug log statement before attempting to wipe the directory is a good practice for tracing execution. However, ensure that the debug level is appropriate for your production environment, as excessive logging can impact performance and clutter log files. Consider using a conditional check to enable this logging only in development or debug modes. 🐢

  2. Security Consideration: Logging the directory path can be useful for debugging, but be cautious if the path contains sensitive information. Ensure that no sensitive data is exposed in logs, especially in production environments. Consider sanitizing or obfuscating sensitive parts of the path if necessary. 🔒


😼 Mergecat review of pkg/git/manager.go

@@ -22,8 +22,11 @@ func NewRepoManager(cfg config.ServerConfig) *RepoManager {
 	return &RepoManager{cfg: cfg}
 }
 
-func (rm *RepoManager) Clone(ctx context.Context, cloneUrl, branchName string) (*Repo, error) {
+func (rm *RepoManager) Clone(ctx context.Context, cloneUrl, branchName string, shallow bool) (*Repo, error) {
 	repo := New(rm.cfg, cloneUrl, branchName)
+	if shallow {
+		repo.Shallow = true
+	}
 
 	if err := repo.Clone(ctx); err != nil {
 		return nil, errors.Wrap(err, "failed to clone repository")

Feedback & Suggestions:

  1. Parameter Documentation: 📄 Consider updating the function documentation to include the new shallow parameter. This will help other developers understand the purpose and effect of this parameter.

  2. Default Behavior: 🤔 If shallow is a new feature, ensure that the default behavior (when shallow is false) remains consistent with the previous implementation. It might be helpful to add a comment explaining the implications of a shallow clone.

  3. Testing: 🧪 Ensure that there are tests covering both shallow and non-shallow clone scenarios. This will help catch any potential issues introduced by this change.

  4. Error Handling: 🔍 Consider adding error handling or logging around the setting of repo.Shallow to catch any unexpected behavior or misuse of the shallow parameter.


😼 Mergecat review of cmd/root.go

@@ -119,6 +119,9 @@ func init() {
 		newStringOpts().
 			withDefault("kubechecks again"))
 	stringSliceFlag(flags, "additional-apps-namespaces", "Additional namespaces other than the ArgoCDNamespace to monitor for applications.")
+	boolFlag(flags, "repo-shallow-clone", "Enable shallow cloning for all git repos.",
+		newBoolOpts().
+			withDefault(false))
 
 	panicIfError(viper.BindPFlags(flags))
 	setupLogOutput()

Feedback & Suggestions:

  1. Security Consideration: Ensure that enabling shallow cloning does not inadvertently expose sensitive data or reduce the security of the cloning process. Shallow clones can sometimes omit important history that might be relevant for security audits.

  2. Documentation: Update the documentation to clearly explain the implications of using shallow cloning, especially in terms of performance and potential limitations in accessing full commit history.

  3. Default Value: The default value for repo-shallow-clone is set to false, which is a safe choice. However, consider if there are scenarios where enabling it by default could be beneficial for performance, and document those scenarios.

  4. Testing: Ensure that there are tests in place to verify the behavior of the application when repo-shallow-clone is enabled and disabled. This will help catch any unexpected issues that might arise from this new feature.

  5. User Feedback: Consider logging a message when shallow cloning is enabled, so users are aware of the mode in which the application is operating. This can be helpful for debugging and understanding application behavior.


😼 Mergecat review of pkg/events/check.go

@@ -55,7 +55,7 @@ type CheckEvent struct {
 }
 
 type repoManager interface {
-	Clone(ctx context.Context, cloneURL, branchName string) (*git.Repo, error)
+	Clone(ctx context.Context, cloneURL, branchName string, shallow bool) (*git.Repo, error)
 }
 
 func generateMatcher(ce *CheckEvent, repo *git.Repo) error {
@@ -192,7 +192,7 @@ func (ce *CheckEvent) getRepo(ctx context.Context, cloneURL, branchName string)
 		return repo, nil
 	}
 
-	repo, err = ce.repoManager.Clone(ctx, cloneURL, branchName)
+	repo, err = ce.repoManager.Clone(ctx, cloneURL, branchName, ce.ctr.Config.RepoShallowClone)
 	if err != nil {
 		return nil, errors.Wrap(err, "failed to clone repo")
 	}

Feedback & Suggestions:

  1. Shallow Clone Parameter: The addition of the shallow parameter to the Clone method is a good enhancement for performance, as it allows for shallow cloning of repositories. However, ensure that the repoManager implementation correctly handles this parameter to avoid any unexpected behavior. 🏗️

  2. Backward Compatibility: Since the Clone method signature has changed, ensure that all implementations of the repoManager interface are updated to accommodate the new parameter. This change might break existing code if not handled properly. Consider providing a default value or a wrapper function if backward compatibility is a concern. 🔄

  3. Documentation: Update any relevant documentation or comments to reflect the change in the Clone method signature. This will help maintain clarity for future developers working with this code. 📚

  4. Testing: Ensure that there are adequate tests covering both scenarios of shallow and non-shallow cloning. This will help verify that the new functionality works as expected and does not introduce any regressions. 🧪


😼 Mergecat review of pkg/config/config.go

@@ -79,6 +79,7 @@ type ServerConfig struct {
 	MonitorAllApplications   bool          `mapstructure:"monitor-all-applications"`
 	OpenAIAPIToken           string        `mapstructure:"openai-api-token"`
 	RepoRefreshInterval      time.Duration `mapstructure:"repo-refresh-interval"`
+	RepoShallowClone         bool          `mapstructure:"repo-shallow-clone"`
 	SchemasLocations         []string      `mapstructure:"schemas-location"`
 	ShowDebugInfo            bool          `mapstructure:"show-debug-info"`
 	TidyOutdatedCommentsMode string        `mapstructure:"tidy-outdated-comments-mode"`

Feedback & Suggestions:

  1. Security Consideration: Ensure that the RepoShallowClone option is validated and used securely. Shallow cloning can be beneficial for performance, but it might miss some history-related checks or operations. Make sure that this does not introduce any security vulnerabilities or logic errors in your application.

  2. Documentation: Consider adding comments or documentation to explain the purpose and impact of the RepoShallowClone field. This will help future developers understand why and when to use this option.

  3. Testing: Ensure that there are adequate tests to cover scenarios where RepoShallowClone is both true and false. This will help in verifying that the feature works as expected and does not introduce any regressions.

  4. Default Value: Consider whether a default value should be set for RepoShallowClone in the configuration, and document this default behavior. This can help avoid unexpected behavior if the field is not explicitly set.


😼 Mergecat review of localdev/kubechecks/values.yaml

@@ -22,16 +22,17 @@ configMap:
     #
     # KUBECHECKS_LABEL_FILTER: "test" # On your PR/MR, prefix this with "kubechecks:"
     # KUBECHECKS_SCHEMAS_LOCATION: https://github.com/zapier/kubecheck-schemas.git
+    KUBECHECKS_REPO_REFRESH_INTERVAL: 30s
     KUBECHECKS_TIDY_OUTDATED_COMMENTS_MODE: "delete"
     KUBECHECKS_ENABLE_CONFTEST: "false"
-
+    KUBECHECKS_REPO_SHALLOW_CLONE: "true"
 
 deployment:
   annotations:
     reloader.stakater.com/auto: "true" 
   
   image:
-    pullPolicy: Never
+    pullPolicy: IfNotPresent
     name: "kubechecks"
     tag: ""
 

Feedback & Suggestions:

  1. Security Concern: The KUBECHECKS_REPO_REFRESH_INTERVAL is set to 30s. Ensure that this interval is appropriate for your use case and does not lead to excessive load or potential security vulnerabilities due to frequent refreshes. Consider documenting the rationale for this interval.

  2. Image Pull Policy: Changing the pullPolicy from Never to IfNotPresent is generally a good practice for environments where the image might not always be available locally. However, ensure that this change aligns with your deployment strategy and that it won't inadvertently pull an outdated image from a remote registry. 🐳

  3. Configuration Consistency: The addition of KUBECHECKS_REPO_SHALLOW_CLONE: "true" is a good optimization for reducing clone times and bandwidth usage. Ensure that this setting is compatible with your repository's requirements and that it doesn't omit necessary data. 📦

  4. Documentation: Consider adding comments or documentation to explain the purpose and impact of the new configuration options (KUBECHECKS_REPO_REFRESH_INTERVAL and KUBECHECKS_REPO_SHALLOW_CLONE) to aid future developers in understanding these settings. 📚


😼 Mergecat review of docs/usage.md

@@ -70,6 +70,7 @@ The full list of supported environment variables is described below:
 |`KUBECHECKS_POLICIES_LOCATION`|Sets rego policy locations to be used for every check request. Can be common path inside the repos being checked or git urls in either git or http(s) format.|`[./policies]`|
 |`KUBECHECKS_REPLAN_COMMENT_MSG`|comment message which re-triggers kubechecks on PR.|`kubechecks again`|
 |`KUBECHECKS_REPO_REFRESH_INTERVAL`|Interval between static repo refreshes (for schemas and policies).|`5m`|
+|`KUBECHECKS_REPO_SHALLOW_CLONE`|Enable shallow cloning for all git repos.|`false`|
 |`KUBECHECKS_SCHEMAS_LOCATION`|Sets schema locations to be used for every check request. Can be a common path on the host or git urls in either git or http(s) format.|`[]`|
 |`KUBECHECKS_SHOW_DEBUG_INFO`|Set to true to print debug info to the footer of MR comments.|`false`|
 |`KUBECHECKS_TIDY_OUTDATED_COMMENTS_MODE`|Sets the mode to use when tidying outdated comments. One of hide, delete.|`hide`|

Feedback & Suggestions:

  1. Clarification on Shallow Clone: The description for KUBECHECKS_REPO_SHALLOW_CLONE could be more informative. Consider specifying what "shallow cloning" entails and its potential benefits or drawbacks. For example, "Enable shallow cloning for all git repos to reduce clone time and disk usage by fetching only the latest commit." This helps users understand the impact of enabling this option. 📝

  2. Default Value Explanation: It might be helpful to explain why the default value is set to false. This can guide users in making an informed decision about whether to change it. 🤔


😼 Mergecat review of cmd/locations_test.go

@@ -19,7 +19,7 @@ type fakeCloner struct {
 	err                  error
 }
 
-func (f *fakeCloner) Clone(_ context.Context, cloneUrl, branchName string) (*git.Repo, error) {
+func (f *fakeCloner) Clone(_ context.Context, cloneUrl, branchName string, shallow bool) (*git.Repo, error) {
 	f.cloneUrl = cloneUrl
 	f.branchName = branchName
 	return f.result, f.err
@@ -43,7 +43,7 @@ func TestMaybeCloneGitUrl_NonGitUrl(t *testing.T) {
 		tc := tc
 		t.Run(tc.name, func(t *testing.T) {
 			fc := &fakeCloner{result: nil, err: nil}
-			actual, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername)
+			actual, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername, false)
 			require.NoError(t, err)
 			assert.Equal(t, "", fc.branchName)
 			assert.Equal(t, "", fc.cloneUrl)
@@ -137,7 +137,7 @@ func TestMaybeCloneGitUrl_HappyPath(t *testing.T) {
 		tc := tc
 		t.Run(tc.name, func(t *testing.T) {
 			fc := &fakeCloner{result: &git.Repo{Directory: testRoot}, err: nil}
-			actual, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername)
+			actual, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername, false)
 			require.NoError(t, err)
 			assert.Equal(t, tc.expected.branch, fc.branchName)
 			assert.Equal(t, tc.expected.cloneUrl, fc.cloneUrl)
@@ -165,7 +165,7 @@ func TestMaybeCloneGitUrl_URLError(t *testing.T) {
 		tc := tc
 		t.Run(tc.name, func(t *testing.T) {
 			fc := &fakeCloner{result: &git.Repo{Directory: testRoot}, err: nil}
-			result, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername)
+			result, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername, false)
 			require.ErrorContains(t, err, tc.expected)
 			require.Equal(t, "", result)
 		})
@@ -193,7 +193,7 @@ func TestMaybeCloneGitUrl_CloneError(t *testing.T) {
 			defer cancel()
 
 			fc := &fakeCloner{result: &git.Repo{Directory: testRoot}, err: tc.cloneError}
-			result, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername)
+			result, err := maybeCloneGitUrl(ctx, fc, time.Duration(0), tc.input, testUsername, false)
 			require.ErrorContains(t, err, tc.expected)
 			require.Equal(t, "", result)
 		})

Feedback & Suggestions:

  1. Parameter Addition: The addition of the shallow parameter to the Clone method and the maybeCloneGitUrl function calls is a good enhancement for flexibility. However, ensure that the shallow parameter is used appropriately within the Clone method to perform shallow cloning if required. 🏗️

  2. Testing for Shallow Clone: Since you've introduced a new parameter, consider adding test cases that specifically test the behavior when shallow is set to true. This will ensure that the functionality is correctly implemented and covered by tests. 🧪

  3. Backward Compatibility: Ensure that the changes maintain backward compatibility, especially if the Clone method is part of a public API. If this is a concern, consider providing a default value or an overloaded method/function. 🔄

  4. Documentation: Update any relevant documentation or comments to reflect the addition of the shallow parameter, explaining its purpose and usage. 📚


😼 Mergecat review of cmd/locations.go

@@ -18,7 +18,7 @@ import (
 
 func processLocations(ctx context.Context, ctr container.Container, locations []string) error {
 	for index, location := range locations {
-		if newLocation, err := maybeCloneGitUrl(ctx, ctr.RepoManager, ctr.Config.RepoRefreshInterval, location, ctr.VcsClient.Username()); err != nil {
+		if newLocation, err := maybeCloneGitUrl(ctx, ctr.RepoManager, ctr.Config.RepoRefreshInterval, location, ctr.VcsClient.Username(), ctr.Config.RepoShallowClone); err != nil {
 			return errors.Wrapf(err, "failed to clone %q", location)
 		} else if newLocation != "" {
 			locations[index] = newLocation
@@ -31,12 +31,12 @@ func processLocations(ctx context.Context, ctr container.Container, locations []
 }
 
 type cloner interface {
-	Clone(ctx context.Context, cloneUrl, branchName string) (*git.Repo, error)
+	Clone(ctx context.Context, cloneUrl, branchName string, shallow bool) (*git.Repo, error)
 }
 
 var ErrCannotUseQueryWithFilePath = errors.New("relative and absolute file paths cannot have query parameters")
 
-func maybeCloneGitUrl(ctx context.Context, repoManager cloner, repoRefreshDuration time.Duration, location, vcsUsername string) (string, error) {
+func maybeCloneGitUrl(ctx context.Context, repoManager cloner, repoRefreshDuration time.Duration, location, vcsUsername string, shallow bool) (string, error) {
 	result := strings.SplitN(location, "?", 2)
 	if !isGitURL(result[0]) {
 		if len(result) > 1 {
@@ -51,7 +51,7 @@ func maybeCloneGitUrl(ctx context.Context, repoManager cloner, repoRefreshDurati
 	}
 	cloneUrl := repoUrl.CloneURL(vcsUsername)
 
-	repo, err := repoManager.Clone(ctx, cloneUrl, query.Get("branch"))
+	repo, err := repoManager.Clone(ctx, cloneUrl, query.Get("branch"), shallow)
 	if err != nil {
 		return "", errors.Wrap(err, "failed to clone")
 	}

Feedback & Suggestions:

  1. Parameter Addition: The addition of the shallow parameter to the Clone method and maybeCloneGitUrl function is a good enhancement for performance, as shallow clones can be faster and use less bandwidth. However, ensure that the shallow parameter is properly documented in the function comments to clarify its purpose and usage. 📄

  2. Backward Compatibility: Ensure that all calls to maybeCloneGitUrl and Clone throughout the codebase are updated to include the new shallow parameter. This will prevent any potential runtime errors due to missing arguments. 🔄

  3. Testing: Consider adding or updating unit tests to cover scenarios with both shallow and non-shallow clones. This will help ensure that the new functionality works as expected and does not introduce any regressions. 🧪

  4. Error Handling: When introducing new parameters that affect behavior, it's crucial to handle any potential errors that might arise specifically due to the new parameter. Ensure that any issues related to shallow cloning are logged or handled appropriately. ⚠️


😼 Mergecat review of pkg/git/repo.go

@@ -28,6 +28,7 @@ type Repo struct {
 	BranchName string
 	Config     config.ServerConfig
 	CloneURL   string
+	Shallow    bool
 
 	// exposed state
 	Directory string
@@ -46,11 +47,17 @@ func New(cfg config.ServerConfig, cloneUrl, branchName string) *Repo {
 }
 
 func (r *Repo) Clone(ctx context.Context) error {
+	if r.Shallow {
+		return r.shallowClone(ctx)
+	}
+
 	var err error
 
-	r.Directory, err = os.MkdirTemp("/tmp", "kubechecks-repo-")
-	if err != nil {
-		return errors.Wrap(err, "failed to make temp dir")
+	if r.Directory == "" {
+		r.Directory, err = os.MkdirTemp("/tmp", "kubechecks-repo-")
+		if err != nil {
+			return errors.Wrap(err, "failed to make temp dir")
+		}
 	}
 
 	log.Info().
@@ -85,6 +92,63 @@ func (r *Repo) Clone(ctx context.Context) error {
 	return nil
 }
 
+func (r *Repo) shallowClone(ctx context.Context) error {
+	var err error
+
+	if r.Directory == "" {
+		r.Directory, err = os.MkdirTemp("/tmp", "kubechecks-repo-")
+		if err != nil {
+			return errors.Wrap(err, "failed to make temp dir")
+		}
+	}
+
+	log.Info().
+		Str("temp-dir", r.Directory).
+		Str("clone-url", r.CloneURL).
+		Str("branch", r.BranchName).
+		Msg("cloning git repo")
+
+	//  Attempt to locally clone the repo based on the provided information stored within
+	_, span := tracer.Start(ctx, "ShallowCloneRepo")
+	defer span.End()
+
+	args := []string{"clone", r.CloneURL, r.Directory, "--depth", "1"}
+	cmd := r.execGitCommand(args...)
+	out, err := cmd.CombinedOutput()
+	if err != nil {
+		log.Error().Err(err).Msgf("unable to clone repository, %s", out)
+		return err
+	}
+
+	if r.BranchName != "HEAD" {
+		// Fetch SHA
+		args = []string{"fetch", "origin", r.BranchName, "--depth", "1"}
+		cmd = r.execGitCommand(args...)
+		out, err = cmd.CombinedOutput()
+		if err != nil {
+			log.Error().Err(err).Msgf("unable to fetch %s repository, %s", r.BranchName, out)
+			return err
+		}
+		// Checkout SHA
+		args = []string{"checkout", r.BranchName}
+		cmd = r.execGitCommand(args...)
+		out, err = cmd.CombinedOutput()
+		if err != nil {
+			log.Error().Err(err).Msgf("unable to checkout branch %s repository, %s", r.BranchName, out)
+			return err
+		}
+	}
+
+	if log.Trace().Enabled() {
+		if err = filepath.WalkDir(r.Directory, printFile); err != nil {
+			log.Warn().Err(err).Msg("failed to walk directory")
+		}
+	}
+
+	log.Info().Msg("repo has been cloned")
+	return nil
+}
+
 func printFile(s string, d fs.DirEntry, err error) error {
 	if err != nil {
 		return err
@@ -118,8 +182,24 @@ func (r *Repo) MergeIntoTarget(ctx context.Context, ref string) error {
 			attribute.String("sha", ref),
 		))
 	defer span.End()
+	merge_command := []string{"merge", ref}
+	// For shallow clones, we need to pull the ref into the repo
+	if r.Shallow {
+		ref = strings.TrimPrefix(ref, "origin/")
+		cmd := r.execGitCommand("fetch", "origin", fmt.Sprintf("%s:%s", ref, ref), "--depth", "1")
+		out, err := cmd.CombinedOutput()
+		if err != nil {
+			telemetry.SetError(span, err, "fetch origin ref")
+			log.Error().Err(err).Msgf("unable to fetch ref %s, %s", ref, out)
+			return err
+		}
+		// When merging shallow clones, we need to allow unrelated histories
+		// and use the "theirs" strategy to avoid conflicts
+		// cons of this is that it may not be entirely accurate and may overwrite changes in the target branch
+		merge_command = []string{"merge", ref, "--allow-unrelated-histories", "-X", "theirs"}
+	}
 
-	cmd := r.execGitCommand("merge", ref)
+	cmd := r.execGitCommand(merge_command...)
 	out, err := cmd.CombinedOutput()
 	if err != nil {
 		telemetry.SetError(span, err, "merge commit into branch")
@@ -131,6 +211,15 @@ func (r *Repo) MergeIntoTarget(ctx context.Context, ref string) error {
 }
 
 func (r *Repo) Update(ctx context.Context) error {
+	// Since we're shallow cloning, to update we need to wipe the directory and re-clone
+	if r.Shallow {
+		r.Wipe()
+		err := os.Mkdir(r.Directory, 0700)
+		if err != nil {
+			return errors.Wrap(err, "failed to create repo directory")
+		}
+		return r.Clone(ctx)
+	}
 	cmd := r.execGitCommand("pull")
 	cmd.Stdout = os.Stdout
 	cmd.Stderr = os.Stdout

Feedback & Suggestions:

  1. Security Consideration: The Shallow flag introduces a new code path for cloning repositories. Ensure that the CloneURL is properly validated and sanitized to prevent potential command injection vulnerabilities. Consider using a library to parse and validate URLs.

  2. Error Handling: In the shallowClone method, errors are logged but not wrapped with additional context. Consider using errors.Wrap to provide more context for debugging.

  3. Code Duplication: The logic for creating a temporary directory is duplicated in both Clone and shallowClone. Consider refactoring this into a separate helper function to reduce code duplication and potential errors.

  4. Merge Strategy: The use of --allow-unrelated-histories and -X theirs in the merge command for shallow clones can lead to data loss if not used carefully. Ensure that this behavior is well-documented and that users are aware of the potential consequences.

  5. Performance: The Update method for shallow clones wipes and re-clones the repository, which can be inefficient. Consider implementing a more efficient update mechanism, such as fetching and resetting the branch.

  6. Concurrency: The MergeIntoTarget method fetches and merges branches. Ensure that concurrent operations on the same repository are handled correctly to avoid race conditions.

  7. Logging: Ensure that sensitive information, such as repository URLs with embedded credentials, is not logged. Review the logging statements to ensure compliance with security best practices.

  8. Testing: Ensure that the new shallow clone functionality is thoroughly tested, including edge cases and error scenarios, to prevent regressions and ensure reliability.



Dependency Review

Click to read mergecats review!

No suggestions found

@MeNsaaH MeNsaaH requested a review from djeebus February 11, 2025 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants