Archive of TG23 site using WARC standard. See ./browsertrix-crawler folder for files and details.
Generated using go-archive repo that uses browsertrix-crawler to generates an interactive (and timetravelable) archive using the WARC (Web ARChive) standard.
To capture a new snapshot of gathering.org run the crawler command in
go-archive repo with tg23s crawl configuration file. Then update this repo
with additional archive files generated.
PS. As we start using WARC as our new archive standard, we expect to transition to a semi-automatic archive setup, where we generate snapshots of the site on a set interval, but will likely not be relevant for this TG23 repo.
The recommended setup is just running go-archive repo/service since that is
the known working setup that is used on Gathering.org archive.
To run manually install pywb and use the
wayback command and a local collection configuration file (see their docs or
examples in go-archive).
PS! We have added a custom acl rule to block loading of client side js. This is to since page is effectively archived as "static" pages, and leaving client side JS triggers fetches to non-archived API routes. The unmodified file is still part of archive.
Due to size of repo/files we use Git LFS for storage of WARC files.