Skip to content

MarginaliaSearch/PublicData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

eaa2326 · Mar 27, 2024

History

11 Commits
Mar 20, 2024
Nov 16, 2023

Repository files navigation

Data Sets

These are data sets used by Marginalia Search. If you feel something belongs that is absent, or is present that doesn't belong, feel free to make a pull request.

Contributions are welcome.

  • blogs.txt is a list of websites that are blogs (or close enough). Websites on this list receive slightly preferential treatment in how they are processed, and they are processed with the assumption that they are blogs with all that entails. blogs.txt is also the list of domains that show up in the new 'Blogosphere' filter.

  • docs.txt is not yet in use, but the idea is to gather as many good documentation sites as possible and make a filter for that.

  • random-domains.txt is the list of domains that are in the random exploration mode.

The Marginalia Search project also shares data sets and dumps from the search engine, much larger than anything you can upload on github, available at https://downloads.marginalia.nu/exports.