Is it possible to add the tool to recreate dataset from the wikipedia dump? Or add the `wikipedia_article_snippets.json` file to the repository? Thanks.