Skip to content

Signed WARC URL generation #79

@ato

Description

@ato

@ikreymer has proposed a web archive architecture with replay capability purely client-side served by static instance of wabac.js, WARC files server by a simple static file server (nginx, S3) and OutbackCDX as the only dynamic server-side component. While technically this obviously already is totally doable it does mean making the full raw WARC files available for download which is likely unacceptable for many institutions who have a requirement to implement some level of restrictions or access controls.

Ilya suggested one solution to this problem would be for the index server to generated signed URLs which include a signature (or some other form of access token) which provides temporary access to specific records.

nginx

There are a lot of different nginx modules that can handle URLs with some kind of signature, HMAC or auth token. The stock secure link module would technically work but is probably best avoided as it uses MD5.

A simple example using https://github.com/nginx-modules/ngx_http_hmac_secure_link_module might be:

location /warcs {
    secure_link_hmac  $arg_token,$arg_timestamp,$arg_expiry;
    secure_link_hmac_secret my_secret_key;
    secure_link_hmac_message $uri|$arg_timestamp|$arg_expiry|$http_range;
    secure_link_hmac_algorithm sha256;
    if ($secure_link_hmac != "1") { return 404; }
}

With a URL that looks like:

https://warcstore/something.warc.gz?timestamp=2020-03-09T09:55:46Z&expiry=900&token=98ea6e4f216f2fb4b69fff9b3a44842c38686ca685f3f55dc48c5d3fb1107be4

Note how the HMAC is configured to include $http_range which ensures the request is only valid for a single specific byte range.

S3

S3 has signed URLs which works rather similarly:

https://my-warc-store.s3-eu-west-1.amazonaws.com/something.warc.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256
&X-Amz-Credential=AKIAIOSFODNN7EXAMPLE/20130721/us-east-1/s3/aws4_request
&X-Amz-Date=20200409T096646Z
&X-Amz-Expires=900
&X-Amz-Signature=13550350a8681c84c861aac2e5b440161c2b33a3e4f302ac680ca5b686de48de
&X-Amz-SignedHeaders=host;range

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions