Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make checksum comparisons more robust to insignificant file changes #16

Open
6 tasks
ezwelty opened this issue Sep 25, 2024 · 0 comments
Open
6 tasks
Labels
enhancement New feature or request

Comments

@ezwelty
Copy link
Owner

ezwelty commented Sep 25, 2024

The archiving process compares file checksums and only stores the new file if it does not match an existing file in the archive. This protects from unnecessary duplication, but some source files include information like timestamps such that they always appear new. Perhaps this could be addressed by applying small tweaks to the downloaded file before calculating the checksum.

Files

  • ArcGIS API: Strip timestamps from response (GML, JSON?). Download as CSV if supported?
  • WFS: Strip timestamps from response (GML, JSON). Download as CSV if supported?

Web page

  • MHTML: Strip timestamp and de-randomize boundary hash
  • HTML: Testing needed
  • PDF: Testing needed
  • PNG: Testing needed
@ezwelty ezwelty added the enhancement New feature or request label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant