Skip to content

Timings #12

@thewillyhuman

Description

@thewillyhuman

Extracted from https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits/.

Loading data from a totally fresh TTL dump into a blank query service is not a quick task currently. In production (wikidata.org) it takes roughly a week, and I had a similar experience while trying to streamline the process as best I could on GCE.

For a dump taken at the end of 2018 the timings for each stage were as follows:

Data dump download: 2 hours
Data Munge: 20 hours
Data load: 4.5 days
Total time: ~5.5 days
Various parts of the process lead me to believe that this could be done faster as throughout CPU usage was pretty low and not all memory was utilized. The loading of the data into blazegraph was by far the slowest step, but digging into this would require someone that is more familiar with the blazegraph internals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions