Skip to content

database.build documentation, e.g. for embeddings? #163

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sfkeller opened this issue Feb 17, 2025 · 2 comments
Open

database.build documentation, e.g. for embeddings? #163

sfkeller opened this issue Feb 17, 2025 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@sfkeller
Copy link

sfkeller commented Feb 17, 2025

Improve documentation

Link

I'm referring to this absolutely amazing experiment with database.build (formerly postgres.new).

Describe the problem

Question 1: Is there a documentation for database.build?

Question 2: How to work with embeddings?
I'm using the same dataset of athletes from the 2024 Paris Olympics as in the blog post and intro video, and I'd like to ask:
Get the names of the athletes whose nick_names are not similar to their names.
How can I do this?

I'm aware that database.build uses transformers.js and PGLite that supports pgvector.

Describe the improvement

This could be the start of the documentation:

Using embeddings: database.build uses pgvector to store embeddings and transformers.js to create embeddings inside the browser. Instead of storing the embeddings "next to" the data the user provides, database.build creates a table meta that can be referenced (because embeddings can be large). When the LLM sees a reference to meta.embeddings, it knows it can "fetch" that data later when it's needed (for RAG etc).

Using the dataset of athletes from the 2024 Paris Olympics as in the blog post and intro video, and I'd like to ask: Get the names of the athletes whose nick_names are not similar to their names. (tbc. ...)

Additional context

A quick tip here on how to achieve a similarity search would be enough for me in the first place.

@sfkeller sfkeller added the documentation Improvements or additions to documentation label Feb 17, 2025
@gregnr
Copy link
Collaborator

gregnr commented Feb 18, 2025

Hey @sfkeller great suggestion 😄 based on your proposed docs, it sounds like you were able to solve this one yourself? If not let me know.

Would you be open to creating a PR with the above improvement? We could add a new "Usage" section to the main README.

@sfkeller
Copy link
Author

Yes, I can do that PR but need your help before since actually yes, I got some results after I was confused of the id references, while I expected the embeddings there. Now I'm stuck with this:

Why aren't there at least same amount of rows (currently 6?) in meta.embeddings as rows with unique nick_names in athletes (2,623)?

What is the number of unique nick_names?
Executed SQL

The number of unique nicknames in the athletes table is 2,623. If you need more information or further analysis, feel free to ask!

Show statistics about the meta table.
Executed SQL

Here are the statistics for the meta.embeddings table:

Statistic	Value
Total Rows	6
Unique IDs	6
Unique Contents	6

P.S. I was just about to write to you a PM about this: I think database.build is so cool that I would like to use it in the exercises for my lecture (possibly with our own LLM). Is database.build really just an experiment or will it still be around next year?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants