Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any failure in command deletes everything from both commit and disk #631

Closed
PoutineSyropErable opened this issue Jan 21, 2025 · 14 comments
Closed

Comments

@PoutineSyropErable
Copy link

This must be fixed. I've lost my progress 3 times now due to using it. I usually keep copies of the entire project before using it...
But, since its needed multiple time sometimes...

If you git filter repo --path --invert-paths
and it's given a file that doesn't exist, it will deletes every file in the project.

At worst, it should delete every file from commit history. But it should not touch the actual file in the directory.

git filter-repo --invert path (miss a space).
Same thing, it whipes every thing.

File does not exist check is basic. I do not know why it is not present.

If so, please educate me. If there are saveguard flags, what are they.

@newren
Copy link
Owner

newren commented Feb 20, 2025

If you git filter repo --path --invert-paths
and it's given a file that doesn't exist, it will deletes every file in the project.

Um, git filter repo --path --invert-paths is an invalid command; at a minimum it's missing the - between filter and repo. It's also missing a pathname between --path and --invert-paths and I can't tell whether you intended to add one there or attempt to add one at the end of the line, which would be an error. You didn't provide a single example of the paths you've been providing, making it hard to determine if you're just specifying paths wrong (as per the FAQ; see https://github.com/newren/git-filter-repo/blob/main/Documentation/FAQ.md#How-should-paths-be-specified)

But it should not touch the actual file in the directory.

Very much disagree. The directory is the latest version and if history rewrites modify the latest version, the working directory should be made to match.

git filter-repo --invert path (miss a space).
Same thing, it whipes every thing.

No, it throws an error:

$ git filter-repo --invert path
git-filter-repo: error: unrecognized arguments: path

Hard to determine what's going wrong when each of the incomplete example commands you provide has one or more errors in it. I suspect you're making similar usage errors in the parts you've elided but I can't correct them unless I know what the commands you are using are. Could you provide some?

File does not exist check is basic. I do not know why it is not present.

You've lost me now; do you mean File-exists check? I'm not sure if you're just continuing to make errors or if you've completely jumped in your train of thought without explaining the connection.

If so, please educate me. If there are saveguard flags, what are they

I mean, there's a --dry-run flag. There's also the instructions repeated in multiple places in the documentation that says to make a fresh clone before running this, so that if anything goes wrong, you can just blow away that clone and make a new fresh clone. Are those what you're referring to, or something else?

@PoutineSyropErable
Copy link
Author

PoutineSyropErable commented Feb 20, 2025

Edit: Skim/Skip this. Accurate issue describe bellow.

Yeah sorry, I didn't give exact command. I just wrote it like I say it.

But what I mean is that
git filter-repo --path --invert-path
(the proper command to do it)

sometimes delete from current working directory and sometime doesn't (Just deletes from history as I wanted). I don't know if it's due to git rm, git rm --cached needing to be done before or after, or if the file is gitignored,

And when you make a typo inside the command, like the one time I gave it a path to a non existant file, It deleted every file in the project repo that wasn't git ignored. ( I was in src/ and gave a file thinking I was in project root. So I gave ./src/filename).

That's just a bug that should never happen.

Another time, I forgot the spaces (before or after, don't remember) "--invert-path" and it did the same. Deleted nearly everything. Maybe not git ignored files and files that start with .

so
git filter-repo--path /path/to/file--invert-paths: nukes
git filter-repo --path/path/to/file --invert-paths: nukes
git filter-repo --invert-paths --path/path/to/non/existant_file : nukes

Of course a path to a non existant file and the wrong path to a file are the same thing.
One of the first 2 or both are problematic.

I don't remember the specifics of what didn't get deleted, I just dealt with it before. It deleted pretty much every important thing that I wanted to git track.

At the end of the day, I use this when a file that's too big get accidentally added 10 or so commit back, and I can't push to github because of it. Or if a semi private file get pushed. If it nukes, I lose all progress since last push. I tend to make a backup locally by just copying the directory when I use it, but when 10 times in a row, it all goes well, you stop being cautious, and then some weird behaviors destroy everything. At least if the files aren't touched, then even if history is fucked, you can just clone again and then merge the files diffs.

Maybe a --keep-in-dir option or something would be nice. Because actually deleting the file, and not having it in the history are two different things imo, and rm exist for that already. No need to change default behavior to keep backward compatibility. (Except the two bugs).

And if it already exist, please tell.

@PoutineSyropErable
Copy link
Author

PoutineSyropErable commented Feb 20, 2025

Figured it out.
--invert-path is needed to delete, otherwise it keeps.

And after going back through history logs from a month ago, the --invert-path was missing in the first example, which is my bad.

But the other time, for the spacing case (non existant files), that's on ya'll.

--invert-paths made it so it only kept a non existant file.
This could be also solved by a --safe flags which stops if any of the file has a filter repo flag in it's name, but that's not the main issue:

git filter-repo --path <path/to/file/not/exist> nukes the projects
and it should have a check to see if any of the files in (see bellow) exist. And if any doesn't, stop.

`git filter-repo --path file1 --path file2 ...

@PoutineSyropErable
Copy link
Author

PoutineSyropErable commented Feb 20, 2025

I'm on arch linux. Here's the versions and commits.

❯ git version
git version 2.48.1
❯ git filter-repo --version
a40bce548d2c

Minimal example: Go into a non git dir to be safe.

mkdir test
cd test
echo "123" > lol
git init
git add .
git commit -m "added a file"
ls
git filter-repo --path fileNotExist --force
ls

Outputs from zsh inside tmux.

❯ mkdir test
❯ cd test
❯ echo "123" > lol
❯ git init
Initialized empty Git repository in /home/francois/filter-repo-test/test/.git/
❯ git add .
❯ git commit -m "added a file"
[master (root-commit) 11cc7a8] added a file
 1 file changed, 1 insertion(+)
 create mode 100644 lol
❯ ls
 lol
❯ git filter-repo --path fileNotExist --force
Parsed 1 commits
New history written in 0.00 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Completely finished after 0.02 seconds.
❯ ls

And yeah, I need to use force cause not a fresh clone. And, anyway since 99% of the time I use it due to not being able to push (because I accidentally not ignored some large file), kinda doesn't apply. Maybe a force-safe flag should be used.

Or again, just throw an error if a file doesn't exist, and if someone want to use filter repo to do a total nuke, then they should use `rm -rf

@newren
Copy link
Owner

newren commented Feb 20, 2025

Yeah sorry, I didn't give exact command. I just wrote it like I say it.

But what I mean is that git filter-repo --invert-paths (the proper command to do it)

Only git filter-repo --invert-paths? That won't delete anything. You'd need to also specify some path arguments for anything to be deleted.

sometimes delete from current working directory and sometime doesn't (Just deletes from history as I wanted). I don't know if it's due to git rm, git rm --cached needing to be done before or after, or if the file is gitignored,

gitignore is irrelevant, unless you mean the file wasn't tracked at all in git. You shouldn't need to do git rm or git rm --cached before or after. Can you provide an example of the repo you had, the command you ran, what you saw, and what you expected?

And when you make a typo inside the command, like the one time I gave it a path to a non existant file, It deleted every file in the project repo that wasn't git ignored. ( I was in src/ and gave a file thinking I was in project root. So I gave ./src/filename).

I doubt you did; when I try it:

$ git filter-repo --path ./src/filename
Error: Invalid path component '.' found in './src/filename'

it provides a nice error. git-filter-repo is aware of that common mistake that users make and checks for it. (...unless you're using a really old version of git-filter-repo?)

If you ran something like git filter-repo --path ${NON_EXISTENT_PATH} then yes, it'd only try to keep a file in git history that matches ${NON_EXISTENT_PATH}, and since nothing matches that, every path and commit would be removed from your history, and then your working tree would be updated to match.

That's just a bug that should never happen.

The fact that the command does as it is told (e.g. "Do not keep anything other than this (non-existent) path"), is hardly a bug; the program is doing what it is told. The problem is the user told it something they didn't actually want.

Another time, I forgot the spaces (before or after, don't remember) "--invert-path" and it did the same. Deleted nearly everything. Maybe not git ignored files and files that start with .

Let's check this out...

git filter-repo--invert-paths /path/to/file: nukes

No it doesn't:

$ git filter-repo--invert-paths /path/to/file
git: 'filter-repo--invert-paths' is not a git command. See 'git --help'.

git filter-repo --invert-paths/path/to/file: nukes

No it doesn't:

$ git filter-repo --invert-paths /path/to/file
git-filter-repo: error: unrecognized arguments: /path/to/file

git filter-repo --invert-paths /path/to/non/existant_file : nukes

No it doesn't:

$ git filter-repo --invert-paths /path/to/non/existant_file
git-filter-repo: error: unrecognized arguments: /path/to/non/existant_file

However, your examples are definitely bad in that every single one of them was using a path that started with /; that'll never match anything. However, if you fixed your command lines to otherwise be valid other than leaving the path alone, you'd still not see anything get deleted and get a warning:

$ git filter-repo --invert-paths --path /path/to/file
Error: Pathnames cannot begin with a '/'
$ git filter-repo --invert-paths --path /path/to/non/existant_file
Error: Pathnames cannot begin with a '/'

at least on any sane OS. I have learned that sadly on some inferior OSes, they have something that automatically translates a path like /path/to/file to C:\Program Files\User/path/to/file or something like that, and passes THAT argument to git-filter-repo. The means git-filter-repo has no way of knowing what path you actually typed, and prevents it from providing a warning in this specific kind of case., and then git-filter-repo can't tell that you typed /path/to/file and warn you that you started with a /. But, the user screwed up the path, not following any of the examples from the manual or the FAQ entry about paths AND some other program defeated git-filter-repo's ability to check what the user typed by transforming the user's command before passing it to git-filter-repo.

Of course a path to a non existant file and the wrong path to a file are the same thing. One of the first 2 or both are problematic.

Yes, the commands are very problematic for the user to specify, as shown above. But git-filter-repo handled them all admirably with nice errors.

I don't remember the specifics of what didn't get deleted, I just dealt with it before. It deleted pretty much every important thing that I wanted to git track.

At the end of the day, I use this when a file that's too big get accidentally added 10 or so commit back, and I can't push to github because of it. Or if a semi private file get pushed. If it nukes, I lose all progress since last push. I tend to make a backup locally by just copying the directory when I use it, but when 10 times in a row, it all goes well, you stop being cautious, and then some weird behaviors destroy everything. At least if the files aren't touched, then even if history is fucked, you can just clone again and then merge the files diffs.

Maybe a --keep-in-dir option or something would be nice. Because actually deleting the file, and not having it in the history are two different things imo, and rm exist for that already. No need to change default behavior to keep backward compatibility. (Except the two bugs).

@newren
Copy link
Owner

newren commented Feb 20, 2025

Figured it out. --invert-path is needed to delete, otherwise it keeps.

Yes, as documented.

And after going back through history logs from a month ago, the --invert-path was missing in the first example, which is my bad.

But the other time, for the spacing case (non existant files), that's on ya'll.

I'm not following, perhaps because I haven't seen a good example command line from you yet demonstrating the bug. Or maybe I missed it. Can you provide one?

--invert-paths made it so it only kept a non existant file. This could be also solved by a --safe flags which stops if any of the file has a filter repo flag in it's name, but that's not the main issue:

git filter-repo --path <path/to/file/not/exist> nukes the projects and it should have a check to see if any of the files in (see bellow) exist. And if any doesn't, stop.

Nope, walking all of history to check whether each and every path listed has ever existed in any commit is a very expensive check in repositories of any meaningful size. There's a good reason the documentation:

  • repeatedly warns that the command is destructive
  • forces you to pass --force if you haven't made a fresh clone to operate on (so that you can throw it away and reclone in case anything goes awry)
  • suggests that you run --analyze first in order to get the list of files that exist so you can get the paths right

@PoutineSyropErable
Copy link
Author

Yeah, I adjusted the comments as I wrote it. I kinda always edits until i have a final version. Didn't think you had responded earlier. Anyway, the last case is perfect.

yes, i didn't test path that start with /, i just wrote it that way as an abstract path, guess i should have put </path/to/non_existant_file>.

Anyway, my last comment shows an example.

@PoutineSyropErable
Copy link
Author

So yes, use analize and dry-run?

@newren
Copy link
Owner

newren commented Feb 20, 2025

And yeah, I need to use force cause not a fresh clone. And, anyway since 99% of the time I use it due to not being able to push (because I accidentally not ignored some large file), kinda doesn't apply.

Oh, it still totally applies:

$ cd ..
$ git clone --no-local my_repo_where_i_could_not_push copy_of_my_repo
$ cd copy_of_my_repo
$ git filter-repo ...

And note that you no longer have to pass --force to git filter-repo because you're working on a fresh clone now. The --no-local clone is suggested in the documentation for cases like this, so you can avoid such problems.

@newren
Copy link
Owner

newren commented Feb 20, 2025

Yeah, I adjusted the comments as I wrote it. I kinda always edits until i have a final version. Didn't think you had responded earlier. Anyway, the last case is perfect.

yes, i didn't test path that start with /, i just wrote it that way as an abstract path, guess i should have put </path/to/non_existant_file>.

Still has the leading /, though...

Anyway, my last comment shows an example.

You mean git filter-repo --path fileNotExist --force? That doesn't show an example matching this:

But the other time, for the spacing case

which is the one I was asking about.

@PoutineSyropErable
Copy link
Author

PoutineSyropErable commented Feb 20, 2025

Well, in this example

mkdir test
cd test
echo "123" > lol
git init
git add .
git commit -m "added a file"
ls
git filter-repo --path fileNotExist --force
ls
# lol got deleted, aka project got nuked

I'll modify into this, so force isn't needed. Though, it was never about force.

mkdir -p filter-repo_tests
cd filter-repo_tests

# Create a new Git repository
mkdir test
cd test
echo "123" > lol
git init
git add .
git commit -m "added a file"

ls  # Should show "lol"


cd ..
git clone --no-local test test_copy
cd test_copy

git filter-repo --path fileNotExist
ls 
# lol got deleted, aka project got nuked

My problems is that it shows nothing (aka, the project got nuked), you said doing verifications for if the file called
fileNotExist actually exist is an expensive operation, because it (would check if the path exist in any commit?) so it's removed for performances reasons.

Does that mean, that filter repo can delete all occurance of a file inside a special path, for every commit, even if it's currently deleted. And if you move the file, it will remove the occurance in the paths you get, but won't affect the path where it moved.

So, due to that, you can't just do a check for if it exists in current stage/commit/directory, because it needs to be able to delete moved, renamed and rm'ed files.
Then, can we have a flag for getting an error if the file doesn't exist inside the current directory? Or at this point, use some alias/shell rc functions to be safe and just ls $1 && git filter-repo...

@newren
Copy link
Owner

newren commented Feb 20, 2025

Well, in this example

mkdir test
cd test
echo "123" > lol
git init
git add .
git commit -m "added a file"
ls
git filter-repo --path fileNotExist --force
ls
# lol got deleted, aka project got nuked

Yes, exactly as you asked.

I'll modify into this, so force isn't needed. Though, it was never about force.

mkdir -p filter-repo_tests
cd filter-repo_tests

# Create a new Git repository
mkdir test
cd test
echo "123" > lol
git init
git add .
git commit -m "added a file"

ls  # Should show "lol"


cd ..
git clone --no-local test test_copy
cd test_copy

git filter-repo --path fileNotExist
ls 
# lol got deleted, aka project got nuked

Yes, which again is exactly what you told the command to do.

But, in this latter case, you can just delete the test_copy directory once you realize the result is not what you wanted, and then re-run the git clone --no-local test test_copy command to get another copy to retry.

My problems is that it shows nothing (aka, the project got nuked)

Right, it deleted all the files, just as you asked. (More specifically, you told it you only wanted to keep fileNotExist and get rid of everything else in all commits of all branches/tags/refs of history.)

you said doing verifications for if the file called fileNotExist actually exist is an expensive operation, because it (would check if the path exist in any commit?) so it's removed for performances reasons.

Does that mean, that filter repo can delete all occurance of a file inside a special path, for every commit, even if it's currently deleted. And if you move the file, it will remove the occurance in the paths you get, but won't affect the path where it moved.

It doesn't check for renames, no. But the path absolutely could exist in older versions and not in current versions, and since filter-repo is about rewriting history, making assumptions about recent versions only makes little sense. What you're asking basically amounts to a multiple pass and walk of all history, where the first path walks all of history to find which files exist, and then before starting the second pass, error out if some of the paths specified by the user don't exist, and then start the second pass that does the actual rewrite.

If you wanted to write something like that, you could have some script call git filter-repo --analyze to get all the paths, compare them to some other paths specified by the user, and then call git filter-repo with any --path and --invert-path arguments you want.

So, due to that, you can't just do a check for if it exists in current stage/commit/directory, because it needs to be able to delete moved, renamed and rm'ed files.

Then, can we have a flag for getting an error if the file doesn't exist inside the current directory?

No; filter-repo is about rewriting all of history of the whole repo; this is too niche IMO; especially since...

Or at this point, use some alias/shell rc functions to be safe and just ls $1 && git filter-repo...

...this seems like a perfectly reasonable way to handle the special case where you only care if it currently exists in the working directory.

@PoutineSyropErable
Copy link
Author

Thanks, really helped me learn.

@newren
Copy link
Owner

newren commented Feb 20, 2025

Thanks, really helped me learn.

Cool! Glad it was helpful.

@newren newren closed this as completed Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants