Skip to content

warc format  #11

@Natkeeran

Description

@Natkeeran

I downloaded a website from Internet Archive using wayback-machine-downloader then created a WARC using warcit with the following command: warcit --fixed-dt 20100212221453 http://domainname.com /dirpath.

It did create a WARC file. I would like to index them into solr using webarchive-discovery. When trying to do so, I get the following error:

2018-08-16 18:22:08 WARN  WARCIndexer:414 - Invalid status line: null@28005
2018-08-16 18:22:08 WARN  WARCIndexer:414 - Invalid status line: null@40193
2018-08-16 18:22:08 WARN  WARCIndexer:414 - Invalid status line: null@79054

I could not load it into to AUT as well.

Example warc is attached. Can WARCIT be used to convert snapshots downloaded from Internet Archive into WARC format? (Unfortunately, Internet Archive does not provide a way to download WARCs).

esports.com.warc.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions