-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always copy PHP and legacy CLI files if they have changed #94
base: main
Are you sure you want to change the base?
Conversation
Saves static hashes for each file and checks their size / hash on each run
fbe6b77
to
042e5b7
Compare
042e5b7
to
f19ce9d
Compare
@akalipetis this will conflict with the vendorization work - but I was playing with it anyway, to find an efficient way to check the files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issues in here are:
- There's no need to include a new library (afero) for abstracting the filesystem, we can use
fs.FS
instead - The hash should be computed from the actual file, not from a file containing the hash value - this beats the purpose of checking hashes in the first place
.PHONY: archive-hashes | ||
archive-hashes: | ||
cd internal/legacy/archives | ||
for f in $(ls -1 | grep -v sha256); do shasum -a 256 "$f" | awk '{print $1}' > "$f".sha256; done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should prefer MD5 here, the hash is used for integrity check and not for tamper protection and the faster MD5 algorithm should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was aiming for hashes not to need computing at runtime at all (#94 (comment)), the performance difference is negligible, and some people like to run automated scanners looking for usage of bad hash algorithms, but I'm fine with MD5 if you prefer.
return fh == hash, nil | ||
} | ||
|
||
func SaveHash(filename, hash string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to maintain the hash, as the file containing the hash can be tampered with as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see other comments #94 (comment)
internal/file/file.go
Outdated
if fh, err := afero.ReadFile(testableFS, filename+hashExt); err == nil { | ||
return string(fh) == hash, nil | ||
} | ||
fh, err := sha256Sum(testableFS, filename) | ||
if err != nil { | ||
if os.IsNotExist(err) { | ||
return false, nil | ||
} | ||
return true, err | ||
} | ||
return fh == hash, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash should be computed from the file itself, otherwise the hash can be tampered in the same way that the file was tampered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used for tamper protection as you say - the pregenerated hashes could almost just be random numbers - I went for real hashes just so there's the possibility of checking against accidental corruption of the copied files themselves and the fallback of being able to recompute when the hash file wasn't copied. But broadly I was going for not having to compute hashes at runtime at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recomputing the hash from the file on every run would theoretically use time and memory that we don't want or need to use - the purpose of this PR is to make sure we have updated files when we change them, the user is free to break their own installation (and it can be fixed again by deleting the main file or hash).
1209f44
to
f0a1218
Compare
Saves static hashes for each file and checks their size / hash on each run
For Windows it's a little different because the files are extracted
Suggestion for #92