-
Notifications
You must be signed in to change notification settings - Fork 20
Description
@ikreymer has proposed a web archive architecture with replay capability purely client-side served by static instance of wabac.js, WARC files server by a simple static file server (nginx, S3) and OutbackCDX as the only dynamic server-side component. While technically this obviously already is totally doable it does mean making the full raw WARC files available for download which is likely unacceptable for many institutions who have a requirement to implement some level of restrictions or access controls.
Ilya suggested one solution to this problem would be for the index server to generated signed URLs which include a signature (or some other form of access token) which provides temporary access to specific records.
nginx
There are a lot of different nginx modules that can handle URLs with some kind of signature, HMAC or auth token. The stock secure link module would technically work but is probably best avoided as it uses MD5.
A simple example using https://github.com/nginx-modules/ngx_http_hmac_secure_link_module might be:
location /warcs {
secure_link_hmac $arg_token,$arg_timestamp,$arg_expiry;
secure_link_hmac_secret my_secret_key;
secure_link_hmac_message $uri|$arg_timestamp|$arg_expiry|$http_range;
secure_link_hmac_algorithm sha256;
if ($secure_link_hmac != "1") { return 404; }
}
With a URL that looks like:
https://warcstore/something.warc.gz?timestamp=2020-03-09T09:55:46Z&expiry=900&token=98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4
Note how the HMAC is configured to include $http_range which ensures the request is only valid for a single specific byte range.
S3
S3 has signed URLs which works rather similarly:
https://my-warc-store.s3-eu-west-1.amazonaws.com/something.warc.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20130721/us-east-1/s3/aws4_request
&X-Amz-Date=20200409T096646Z
&X-Amz-Expires=900
&X-Amz-Signature=13550350a8681c84c861aac2e5b440161c2b33a3e4f302ac680ca5b686de48de
&X-Amz-SignedHeaders=host;range