Skip to content

Commit

Permalink
🗑
Browse files Browse the repository at this point in the history
  • Loading branch information
transitive-bullshit committed Aug 17, 2023
1 parent e4a40d4 commit 5c4071b
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 27 deletions.
15 changes: 1 addition & 14 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
- [Intro](#intro)
- [How it works](#how-it-works)
- [Semantic Search](#semantic-search)
- [Keyword Search](#keyword-search)
- [TODO](#todo)
- [License](#license)

Expand All @@ -40,27 +39,15 @@ The steps involved include:
5. Fetching provider-specific metadata for some links (e.g. tweet text)
6. Generating vector embeddings for each link using OpenAI
7. Upserting all links into a Pinecone vector database
8. Upserting all links into a Meilisearch database

We're using [IFramely](https://iframely.com/) to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.

Once we have all of the links locally, we upsert them into two databases:

- A [Pinecone](https://www.pinecone.io/) vector database for semantic search
- A [Meilisearch](https://www.meilisearch.com/) database for traditional keyword search

Supporting both of these search indices isn't necessary, but I wanted to have a live comparison of the two approaches in action.

In general, I've found that semantic search is more accurate than keyword search, but keyword search is much faster and can be more intuitive for users.
Once we have all of the links locally, we upsert them into a [Pinecone](https://www.pinecone.io/) vector database for semantic search.

### Semantic Search

Semantic search is powered by [OpenAI's \`text-embedding-ada-002\` embedding model](https://platform.openai.com/docs/guides/embeddings/) and [Pinecone's hosted vector database](https://www.pinecone.io/).

### Keyword Search

Traditional keyword-based search is powered by [Meilisearch](https://www.meilisearch.com/).

## TODO

- better search UX so back button works
Expand Down
14 changes: 1 addition & 13 deletions src/pages/about/index.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -28,27 +28,15 @@ The steps involved include:
5. Fetching provider-specific metadata for some links (e.g. tweet text)
6. Generating vector embeddings for each link using OpenAI
7. Upserting all links into a Pinecone vector database
8. Upserting all links into a Meilisearch database
We're using [IFramely](https://iframely.com/) to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.
Once we have all of the links locally, we upsert them into two databases:
- A [Pinecone](https://www.pinecone.io/) vector database for semantic search
- A [Meilisearch](https://www.meilisearch.com/) database for traditional keyword search
Supporting both of these search indices isn't necessary, but I wanted to have a live comparison of the two approaches in action.
In general, I've found that semantic search is more accurate than keyword search, but keyword search is much faster and can be more intuitive for users.
Once we have all of the links locally, we upsert them into a [Pinecone](https://www.pinecone.io/) vector database for semantic search.
### Semantic Search
Semantic search is powered by [OpenAI's \`text-embedding-ada-002\` embedding model](https://platform.openai.com/docs/guides/embeddings/) and [Pinecone's hosted vector database](https://www.pinecone.io/).
### Keyword Search
Traditional keyword-based search is powered by [Meilisearch](https://www.meilisearch.com/).
## License
This webapp is [open source](${config.githubRepoUrl}). MIT © [${config.author}](${config.twitterUrl})
Expand Down

0 comments on commit 5c4071b

Please sign in to comment.