Skip to content

AI search for all the best resources in AI – powered by Ben's Bites πŸ’―

License

Notifications You must be signed in to change notification settings

transitive-bullshit/bens-bites-ai-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

dbbacc6 Β· Feb 17, 2024
Aug 19, 2023
Jan 18, 2023
Feb 17, 2024
Mar 29, 2023
Sep 1, 2023
Mar 24, 2023
Mar 24, 2023
Jan 18, 2023
Aug 17, 2023
Jan 18, 2023
Jan 18, 2023
Mar 23, 2023
Mar 23, 2023
Mar 23, 2023
Aug 17, 2023
Aug 17, 2023
Mar 23, 2023
Aug 17, 2023
Mar 23, 2023
Mar 23, 2023

Repository files navigation

Ben's Bites

Ben's Bites Link Search

Search across all of the AI-related links in the Ben's Bites newsletter – using AI-powered semantic search.

Build Status MIT License Prettier Code Formatting

Intro

The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.

All search results are extracted from Ben's Bites AI Newsletter, which is used as a highly curated data source.

How it works

A cron job is run every 24 hours to update the database.

The steps involved include:

  1. Crawling the source Beehiiv newsletter
  2. Converting each post to markdown
  3. Extracting and resolving unique links
  4. Fetching opengraph metadata for each link
  5. Fetching provider-specific metadata for some links (e.g. tweet text)
  6. Generating vector embeddings for each link using OpenAI
  7. Upserting all links into a Pinecone vector database

We're using IFramely to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.

Once we have all of the links locally, we upsert them into a Pinecone vector database for semantic search.

Semantic Search

Semantic search is powered by OpenAI's `text-embedding-ada-002` embedding model and Pinecone's hosted vector database.

TODO

  • better search UX so back button works
  • show the number of posts / links on the home page so it's clear when it was last updated
  • acutally sort by recency instead of faking it
  • set up cron to update the DB daily
  • test on safari/firefox
  • display which newsletter the post first appeared in
  • explore hybrid search
  • infinite scroll so you can keep scrolling results

License

MIT Β© Travis Fischer

All link data is extracted from Ben's Bites AI Newsletter and is licensed under CC BY-NC-ND 4.0.

If you found this project interesting, please consider sponsoring me or following me on twitter twitter

About

AI search for all the best resources in AI – powered by Ben's Bites πŸ’―

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published