You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
340 WARC files of the news crawl data set, starting from 2020-09-12 until 2020-10-04 have been captured using HTTP/2 after a Java security upgrade which included ALPN and therefor allowed for HTTP/2. The crawler started to use HTTP/2 after an automatic restart.
The mentioned WARC files may cause WARC readers (eg. jwarc) to fail while parsing the HTTP headers:
request
GET /2020/09/12/business/brexit-no-deal-uk-economy/index.html HTTP/2
...
340 WARC files of the news crawl data set, starting from 2020-09-12 until 2020-10-04 have been captured using HTTP/2 after a Java security upgrade which included ALPN and therefor allowed for HTTP/2. The crawler started to use HTTP/2 after an automatic restart.
The mentioned WARC files may cause WARC readers (eg. jwarc) to fail while parsing the HTTP headers:
To address the issue:
Affected files:
More than 80% of the records are captured using HTTP/2.
The text was updated successfully, but these errors were encountered: