From e559eb4c4da1de7751b4022e3b413403ca456479 Mon Sep 17 00:00:00 2001 From: Adrien Barbaresi Date: Wed, 5 Aug 2020 14:19:17 +0200 Subject: [PATCH] added a python tool --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 1495696..214a9ee 100644 --- a/README.md +++ b/README.md @@ -43,6 +43,7 @@ A collection of awesome web crawler,spider and resources in different languages. * [spidy](https://github.com/rivermont/spidy) - The simple, easy to use command line web crawler. * [newspaper](https://github.com/codelucas/newspaper) - News, full-text, and article metadata extraction in Python 3 * [aspider](https://github.com/howie6879/aspider) - An async web scraping micro-framework based on asyncio. +* [trafilatura](https://github.com/adbar/trafilatura) - Library and command-line tool to extract metadata, main text, and comments. ## Java * [ACHE Crawler](https://github.com/ViDA-NYU/ache) - An easy to use web crawler for domain-specific search.