Skip to content

fetch followed by pull is inefficient #28

Open
@vaab

Description

@vaab

I noticed while tinkering with gitaggregate 1.6.0 (git version 2.7.4) that it was issuing a fetch followed by pull. As we all know, pull is a shorthand for a fetch followed by a merge. We could think that the second fetch is not so harmful as it won't download anything... but that wrong:

  • the second pull is not garanteed to give the same resut that the first fetch (the target might have changed in between), and this might have some consequence in the code.
  • even in --depth 1 and when the code didn't change, there are still some computation and network exchanges that could be avoided.

Full example:

mkdir /tmp/gita -p &&
cd /tmp/gita &&
cat <<EOF > repos.yaml
./foo:
    defaults:
        depth: 1
    remotes:
        r1: file:///tmp/gita/r1
    target: r1 agg
    merges:
    - r1 t1
EOF

## making remote git repository 'r1'
mkdir -p r1 && cd r1
git init . &&
touch a &&
git add a &&
git commit -am "first commit" &&
git tag t1 &&
cd ..

gitaggregate -c repos.yaml --log-level DEBUG

would output

(D) [17:18:29] git_aggregator.main  foo  main.aggregate_repo():198 <git_aggregator.repo.Repo object at 0x7f2e0fe29d68>
(I) [17:18:29] git_aggregator.repo  foo  repo.aggregate():169 Start aggregation of /tmp/gita/foo
(I) [17:18:29] git_aggregator.repo  foo  repo.init_repository():192 Init empty git repository in /tmp/gita/foo
(D) [17:18:29] git_aggregator.repo  foo  repo.log_call():158 /tmp/gita/foo> call ['git', 'init', '/tmp/gita/foo']
Initialized empty Git repository in /tmp/gita/foo/.git/
(I) [17:18:29] git_aggregator.repo  foo  repo._switch_to_branch():247 Switch to branch agg
(D) [17:18:29] git_aggregator.repo  foo  repo.log_call():158 /tmp/gita/foo> call ['git', 'checkout', '-B', 'agg']
Switched to a new branch 'agg'
(D) [17:18:29] git_aggregator.repo  foo  repo.log_call():158 /tmp/gita/foo> call ['git', 'remote', '-v']
(I) [17:18:29] git_aggregator.repo  foo  repo._set_remote():298 Adding remote r1 <file:///tmp/gita/r1>
(D) [17:18:29] git_aggregator.repo  foo  repo.log_call():158 /tmp/gita/foo> call ['git', 'remote', 'add', 'r1', 'file:///tmp/gita/r1']
(I) [17:18:29] git_aggregator.repo  foo  repo.fetch():197 Fetching required remotes
(D) [17:18:29] git_aggregator.repo  foo  repo.log_call():158 /tmp/gita/foo> call ('git', 'fetch', '--depth', '1', 'r1', 't1')
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From file:///tmp/gita/r1
 * tag               t1         -> FETCH_HEAD
(I) [17:18:29] git_aggregator.repo  foo  repo._merge():256 Pull r1, t1
(D) [17:18:29] git_aggregator.repo  foo  repo.log_call():158 /tmp/gita/foo> call ('git', 'pull', '--no-edit', '--depth', '1', 'r1', 't1')
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From file:///tmp/gita/r1
 * tag               t1         -> FETCH_HEAD
(I) [17:18:29] git_aggregator.repo  foo  repo._execute_shell_command_after():251 Execute shell after commands
(I) [17:18:29] git_aggregator.repo  foo  repo.aggregate():189 End aggregation of /tmp/gita/foo

Notice the 2 full outputs with obvious computation done twice.

I noted that the second output seems smaller (and quicker?) if depth: 1 is in defaults. And no fetch_all is activated.

I have a PR solving this issue coming soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions