Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always copy PHP and legacy CLI files if they have changed #94

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pjcdawkins
Copy link
Contributor

Saves static hashes for each file and checks their size / hash on each run

For Windows it's a little different because the files are extracted

Suggestion for #92

Saves static hashes for each file and checks their size / hash on each run
@pjcdawkins
Copy link
Contributor Author

@akalipetis this will conflict with the vendorization work - but I was playing with it anyway, to find an efficient way to check the files

@akalipetis akalipetis self-requested a review August 7, 2023 13:39
Copy link
Member

@akalipetis akalipetis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issues in here are:

  1. There's no need to include a new library (afero) for abstracting the filesystem, we can use fs.FS instead
  2. The hash should be computed from the actual file, not from a file containing the hash value - this beats the purpose of checking hashes in the first place

.PHONY: archive-hashes
archive-hashes:
cd internal/legacy/archives
for f in $(ls -1 | grep -v sha256); do shasum -a 256 "$f" | awk '{print $1}' > "$f".sha256; done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should prefer MD5 here, the hash is used for integrity check and not for tamper protection and the faster MD5 algorithm should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aiming for hashes not to need computing at runtime at all (#94 (comment)), the performance difference is negligible, and some people like to run automated scanners looking for usage of bad hash algorithms, but I'm fine with MD5 if you prefer.

internal/file/file.go Outdated Show resolved Hide resolved
return fh == hash, nil
}

func SaveHash(filename, hash string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to maintain the hash, as the file containing the hash can be tampered with as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see other comments #94 (comment)

Comment on lines 54 to 65
if fh, err := afero.ReadFile(testableFS, filename+hashExt); err == nil {
return string(fh) == hash, nil
}
fh, err := sha256Sum(testableFS, filename)
if err != nil {
if os.IsNotExist(err) {
return false, nil
}
return true, err
}
return fh == hash, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash should be computed from the file itself, otherwise the hash can be tampered in the same way that the file was tampered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used for tamper protection as you say - the pregenerated hashes could almost just be random numbers - I went for real hashes just so there's the possibility of checking against accidental corruption of the copied files themselves and the fallback of being able to recompute when the hash file wasn't copied. But broadly I was going for not having to compute hashes at runtime at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recomputing the hash from the file on every run would theoretically use time and memory that we don't want or need to use - the purpose of this PR is to make sure we have updated files when we change them, the user is free to break their own installation (and it can be fixed again by deleting the main file or hash).

internal/file/file.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants