Skip to content

Automate the sync of in-house forks with upstream repositories #1240

@kriswest

Description

@kriswest

Is your feature request related to a problem? Please describe.
We find ourselves creating internal forks/replicas of upstream repositories that are used for internal collaboration prior to contribution or to support consumption of a repository or testing of code under review. These repositories have to be synchronised (changes pulled and merged) with their upstream counterparts - often the main or default branch, but sometimes other branches as well. That synchronisation can be performed manually or scripted, but it would be more convenient for our users if the repositories were automatically synced (on a schedule or on activity upstream).

Using Git Proxy to automate the synchronisation would also allow us to apply a plugin chain to change sets as they ingress to generate notifications or block sync if issues are detected (such as unexpected obfuscated code).

Describe the solution you'd like
Git Proxy could support an admin feature to synchronise nominated branches from an upstream repository with another git repository. This would require:

  • An administrative interface to configure syncing
    • Update the repository page to configure syncing for a git proxy project
      • Allow config of multiple sync jobs per project
      • Allow selection of a branch to sync and entry of target repository URL, username and access token
        • Access tokens should be encrypted at rest
        • We may also need to support the config of credentials for the upstream repo to support private projects
      • Allow configuration of a schedule for each sync job
    • View all sync jobs and status
      • Link to the relevant repo page for editing scheduled jobs
  • A scheduler to execute the sync tasks according to their schedules.

In addition to enabling that synchronisation, being able to apply a plugin chain to the changes to be synchronized and then blocking the sync and/or sending a notification would be highly desirable.

Describe alternatives you've considered
A manually maintained set of sync jobs implemented and managed via shell scripts, which must be run somewhere.

Additional context

A sync job could be implemented in bash (for illustrative purposes) as below - the proposal is to have git proxy handle this, with web-based admin and the isometric git client instead:

#!/bin/bash

# === Usage ===
# ./sync-upstream.sh <target_repo_url> <target_branch> <upstream_repo_url> <upstream_branch>

# === Arguments ===
TARGET_REPO_URL="$1"
TARGET_REPO_BRANCH="$2"
UPSTREAM_REPO_URL="$3"
UPSTREAM_BRANCH="$4"

# === Derive repo name from target URL ===
TARGET_REPO_NAME=$(basename "$TARGET_REPO_URL" .git)

# === Derived Variables ===
TEMP_DIR="/tmp/git-sync-$TARGET_REPO_NAME"

# === Prepare working directory ===
mkdir -p "$TEMP_DIR"
cd "$TEMP_DIR" || exit 1

# === Clone target repo ===
git clone "$TARGET_REPO_URL"
cd "$TARGET_REPO_NAME" || exit 1

# === Switch to target branch ===
git switch "$TARGET_REPO_BRANCH"

# === Add upstream remote and fetch ===
git remote add upstream "$UPSTREAM_REPO_URL"
git fetch upstream

# === Check for upstream changes not in target ===
CHANGES=$(git rev-list HEAD..upstream/$UPSTREAM_BRANCH --count)

if [ "$CHANGES" -eq 0 ]; then
    echo "No changes to sync from upstream/$UPSTREAM_BRANCH. Exiting."
    cd ..
    rm -rf "$TARGET_REPO_NAME"
    exit 0
else
    echo "Found $CHANGES new commits in upstream/$UPSTREAM_BRANCH. Proceeding with sync..."
fi

# === Merge and push ===
git merge --no-edit upstream/"$UPSTREAM_BRANCH"
git push

# === Cleanup ===
cd ..
rm -rf "$TARGET_REPO_NAME"

Example usage:

./sync-upstream.sh \
  https://scmprovider.com/your-org/git-proxy.git \
  main \
  https://github.com/finos/git-proxy.git \
  main

./sync-upstream.sh \
  https://scmprovider.com/your-org/fdc3.git \
  main \
  https://github.com/finos/fdc3.git \
  main

The feature could optionally allow the checked out dir to remain cached and not be removed/re-checked out. However, there may be security issue to consider.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions