EmbedJs is a NodeJS framework that simplifies RAG application development by efficiently processing unstructured data. It segments data, creates relevant embeddings, and stores them in a vector database for quick retrieval.
Here's an example of how easy it is to get started -
const ragApplication = await new RAGApplicationBuilder()
.addLoader(new SitemapLoader({ url: 'https://tesla-info.com/sitemap.xml' }))
.addLoader(new WebLoader({ url: 'https://en.wikipedia.org/wiki/Tesla,_Inc.' }))
.setVectorDb(new LanceDb({ path: '.db' }))
.build();
That's it. Now you can ask questions -
console.log(await ragApplication.query('Tell me about the history of Tesla?'));
-
Supports several large language models
-
Supports many vector databases including self-hosted and cloud variants.
-
Load different kinds of unstructured data. Comes built in with several loaders that makes this easy.
-
Supports several cache options that can greatly improve the performance of your RAG applications in production.
-
EmbedJs's simple and powerful API allows both quick launch and deep customizabilty.
The author(s) are looking to add core maintainers for this opensource project. Reach out on Linkedin if you are interested. If you want to contribute in general - create issues on GitHub or send in PRs.
- Getting started
- Loaders
- LLMs
- Embedding Models
- Vector databases supported
- Caches
- Sample projects
- Contributors
You can install the library via NPM or Yarn
npm i @llm-tools/embedjs
Note: The library uses the newer ES6 modules and import
syntax.
To configure a new EmbedJs application, you need to do three steps -
1. Pick an LLM
The library supports several LLMs. Activate one by allowing the instructions in the LLM section.
const ragApplication = await new RAGApplicationBuilder()
.setModel(new HuggingFace({ modelName: 'mistralai/Mixtral-8x7B-v0.1' }))
...
2. Pick a Vector database
The library supports several vector databases. Enable one by allowing the instructions in the Vector Databases section.
.setVectorDb(new PineconeDb({ projectName: 'test', namespace: 'dev' }))
3. Load some data
The library supports several kinds of loaders. You can use zero, one or many kinds of loaders together to import custom knowledge. Read the loaders section to learn more about the different supported loaders.
.addLoader(new YoutubeSearchLoader({ searchString: 'Tesla cars' }))
.addLoader(new SitemapLoader({ url: 'https://tesla-info.com/sitemap.xml' }))
.build();
That's it! Now that you have your instance of RAGApplication
, you can use it to query against the loaded data sets, like so -
await ragApplication.query('What is Tesla?');
The temperature is a number between 0 and 1. It governs the randomness and creativity of the LLM responses. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. You can alter it by -
await new RAGApplicationBuilder()
.setTemperature(0.1)
NOTE: The default value is 0.1, which makes the GPT responses very precise.
This is the number of documents to aim for when retrieving results from the vector database. A high number of results might mean there is more non-relevant data in the context. A low number might mean none of the relevant documents are retrieved. You need to set the number that works best for you. The parameter can be altered by -
await new RAGApplicationBuilder()
.setSearchResultCount(10)
NOTE: The default value is 7.
It is important to note that the library does not simply dump all contextual document chunks into the prompt. It sends them to the model marking them as context documents. The number of documents still counts toward the token limit.
When the number of documents fetched leads to a request above the token limit, the library uses the following strategy -
It runs a preprocessing step to select relevant sections from each document until the total number of tokens is less than the maximum number of tokens allowed by the model. It then uses the transformed documents as context to answer the question.
LLM models need some care. The models are notorious for inventing responses when they don't know the answer. Keeping this in mind, the library auto adds a wrapper to all user queries. The default prompt is -
Use all the provided context to answer the query at the end. Answer in full. If you don't know the answer, just say that you don't know, don't try to make up an answer. Query: {0}
The placeholder {0}
is replaced with the input query. In some cases, you may want to customize this prompt. This can be done with ease by -
await new RAGApplicationBuilder()
.QueryTemplate('My own query template')
During development, you may want to test the performance and quality of the Loaders
you have enabled without making any LLM calls. You can do this by using the getContext
method -
await ragApplication.getContext('What is Steve Jobs?')
You can fetch the count of embeddedings stored in your vector database at any time by calling the getEmbeddingsCount
method -
await ragApplication.getEmbeddingsCount()
Loaders take a specific format, process the input and create chunks of the data. Currently, the library supports the following formats -
To add a pdf file, use PdfLoader
. You can add a local file -
.addLoader(new PdfLoader({ filePath: path.resolve('paxos-simple.pdf') }))
Or, you can add a remote file -
.addLoader(new PdfLoader({ url: 'https://lamport.azurewebsites.net/pubs/paxos-simple.pdf' }))
By default, the PdfLoader uses the hash of the filePath or the Url as the loader cache key. In some cases, like when using dynamic files with the same name, you can pass in your own custom uniqueId like so -
.addLoader(new PdfLoader({ url: '<URL>', , uniqueId: 'MY_UNIQUE_ID' }))
Note: Currently there is no support for PDF forms and password protected documents
To add a web page, use WebLoader
.
.addLoader(new WebLoader({ url: 'https://en.wikipedia.org/wiki/Formula_One' }))
To add a XML sitemap, use SitemapLoader
.
.addLoader(new SitemapLoader({ url: '...' }))
This will load all URLs in a sitemap via the WebLoader.
To supply your own text, use TextLoader
.
.addLoader(new TextLoader({ text: 'The best company name for a company making colorful socks is MrSocks' }))
Note: Feel free to add your custom text without worrying about duplication. The library will chuck, cache and update the vector databases.
To add any youtube video to your app, use YoutubeLoader
.
Note: You will need to import it dynamically and have the required pacakges installed.
.addLoader(new YoutubeLoader({ videoIdOrUrl: 'w2KbwC-s7pY' }))
To add all videos in a youtube channel, use YoutubeChannelLoader
.
Note: You will need to import it dynamically and have the required pacakges installed.
.addLoader(new YoutubeChannelLoader({ channelId: '...' }))
To do a general youtube search and add the popular search results, use YoutubeSearchLoader
.
Note: You will need to import it dynamically and have the required pacakges installed.
.addLoader(new YoutubeSearchLoader({ searchString: '...' }))
To add a confluence space, use ConfluenceLoader
.
Note: You will need to import it dynamically and have the required pacakges installed.
.addLoader(new ConfluenceLoader({ spaceNames: ['...'] }))
You also need to set the following environment variables -
CONFLUENCE_BASE_URL=<your space base url>
CONFLUENCE_USER_NAME=<your email id or username>
CONFLUENCE_API_TOKEN=<your personal or bot access token>
Note: The confluence space name is the value you see in the url in the space overview page /wiki/spaces/{{ space name }}/overview
.
You can pass along a custom loader to the addLoader
method by extending and implementing the abstract class BaseLoader
. Here's how that would look like -
class CustomLoader extends BaseLoader<{ customChunkMetadata: string }> {
constructor() {
super('uniqueId');
}
async *getChunks() {
throw new Error('Method not implemented.');
}
}
We really encourage you send in a PR to this library if you are implementing a common loader pattern, so the community can benefit from it.
If you want to add any other format, please create an issue and we will add it to the list of supported formats. All PRs are welcome.
It's relatively easy to switch between different LLMs using the library. We support the following LLMs today -
To use the OpenAI LLM models, you need a API key from OpenAI. You can alternatively use Azure OpenAI to run these models. Read the Azure OpenAI section below to learn more about this. In this section, we will cover how to use OpenAI provided LLMs.
The first step is to obtain an API Key from OpenAI. You can do this by visiting their API Portal. Once you obtain a key, set it as an environment variable, like so -
OPENAI_API_KEY="<Your key>"
Once this is done, it is relatively easy to run OpenAI LLMs. All you need is to indicate the model type you want to run.
- For GPT 3.5 Turbo
const ragApplication = await new RAGApplicationBuilder()
.setModel(SIMPLE_MODELS.OPENAI_GPT3_TURBO)
- For GPT 4
const ragApplication = await new RAGApplicationBuilder()
.setModel(SIMPLE_MODELS.OPENAI_GPT4)
- To use a custom model name
const ragApplication = await new RAGApplicationBuilder()
.setModel(new OpenAi({ modelName: 'gpt-4' }))
Note: GPT 3.5 Turbo is used as the default model if you do not specifiy one.
To use Mirstal's models, you will need to get an API Key from Mistral. You can do this from their console. Once you have obtained a key, set Mistral as your LLM of choice -
const ragApplication = await new RAGApplicationBuilder()
.setModel(new Mistral({ accessToken: "<YOUR_MISTRAL_TOKEN_HERE>" }))
By default, the mistral-medium
model from Mistral is used. If you want to use a different Mistral model, you can specify it via the optional parameter to the Mistral constructor, like so -
const ragApplication = await new RAGApplicationBuilder()
.setModel(new Mistral({ accessToken: "<YOUR_MISTRAL_TOKEN_HERE>", modelName: "..." }))
Note: If you want to run Mistral open source for free, you can do so using the HuggingFace platform (read below). Just make sure to set the modelName to mistralai/Mistral-7B-v0.1
or the version you want to run.
Hugging face needs no introduction. They host a variety of open source LLMs and you can use most of them for free. To run hugging face inference based LLMs with this library, you will need a free hugging face token.
You can get an API Token by signing up for hugging face and generate a token from this page. Once you get the token, set it to the environment like so -
HUGGINGFACEHUB_API_KEY="<Your hf key>"
That's all, now you can use any hugging face model. To do this set HuggingFace
as your model processor of choice -
const ragApplication = await new RAGApplicationBuilder()
.setModel(new HuggingFace({ modelName: "..." })))
Note: Not all hugging face models are fully free to consume via their API. Since running these models takes a lot of resources, Hugging Face charges a fee for a few of the larger ones. This is the case with Meta's meta-llama/Llama-2-7b-hf
, for example.
To use these 'not-free' models via HuggingFace, you need to subscribe to their Pro plan or create a custom inference endpoint. It is possible to self host these models for free and run them locally via Ollama - support for which is coming soon.
In order to be able to use an OpenAI model on Azure, it first needs to be deployed. Please refer to Azure OpenAI documentation on how to deploy a model on Azure. To run this library, you will need to deploy two models -
- text-embedding-ada
- GPT-3.5-turbo (or the 4 series)
Once these models are deployed, using Azure OpenAI instead of the regular OpenAI is easy to do. Just follow these steps -
-
Remove the
OPENAI_API_KEY
environment variable if you have set it already. -
Set the following environment variables -
# Set this to `azure`
OPENAI_API_TYPE=azure
# The API version you want to use
AZURE_OPENAI_API_VERSION=2023-05-15
# The base URL for your Azure OpenAI resource. You can find this in the Azure portal under your Azure OpenAI resource.
export AZURE_OPENAI_BASE_PATH=https://your-resource-name.openai.azure.com/openai/deployments
# The API key1 or key2 for your Azure OpenAI resource
export AZURE_OPENAI_API_KEY=<Your Azure OpenAI API key>
# The deployment name you used for your embedding model
AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME=text-embedding-ada-002
# The deployment name you used for your llm
AZURE_OPENAI_API_DEPLOYMENT_NAME=gpt-35-turbo
You can now run the Azure OpenAI LLMs using the OpenAi
model detailed above.
You can use a custom LLM model by implementing the BaseModel
interface. Here's how that would look like -
class MyOwnLLMImplementation implements BaseModel {
override async init(): Promise<void> {} //optional to override
protected abstract runQuery(
system: string, //the system prompt
userQuery: string, //the current user query
supportingContext: Chunk[], //all supporting documents
pastConversations: ConversationHistory[], //the chat history so far
): Promise<string> {
throw new Error("Method not implemented.");
}
}
Once done, you can pass this class to the setModel
method like shown in the examples above. That said, we really encourage you send in a PR to this library if you are implementing a famous or common LLM, so the community can benefit from it.
If you want us to add support for a specific LLM, please create an issue and we will prioritize it. All PRs are welcome.
Currently, we next plan to add support for Ollama.
Embedding models are LLMs that convert a string into vector better suited for processing. In most cases, the default text-embedding-ada-002
model from OpenAI is going to be good enough. If you want to use this model, you do not have to do anything.
However in some advanced cases, you may want to change this; after all, different embedding models perform differently under different curcumstances. The library allows you to do this using the method setEmbeddingModel
while building the RAGApplication
.
The library supports the following embedding models -
The text-embedding-3-small
is a new standard embedding model released by OpenAI in Jan, 2024. It is the default used by the libary. This model is cheaper and better than their older Ada model. This model returns vectors with dimension 1536.
You do not have to do anything to enable it.
The text-embedding-3-large
is also a new standard embedding model released by OpenAI in Jan, 2024. This model is the best embedding model provided by OpenAI as of now but is also the most expensive. This model returns vectors with dimension 3072.
To set it as your model of choice -
- Set
OpenAi3LargeEmbeddings
as your embedding model onRAGApplicationBuilder
await new RAGApplicationBuilder()
.setEmbeddingModel(new OpenAi3LargeEmbeddings())
The text-embedding-ada-002
is a well known model from OpenAI. You can read more about it here. This model returns vectors with dimension 1536.
To set it as your model of choice -
- Set
AdaEmbeddings
as your embedding model onRAGApplicationBuilder
await new RAGApplicationBuilder()
.setEmbeddingModel(new AdaEmbeddings())
The library supports usage of Cohere-AI embed-english-v2.0
embedding model out of the box. This model returns vectors with dimension 4096.
Here's what you have to do to enable it -
-
Sign up for an account with Cohere-AI if you have not done so already. Once done, go to the API Keys section and copy an API_KEY.
-
Load the key you just obtained in the environment variable
COHERE_API_KEY
COHERE_API_KEY="<YOUR_KEY>"
- Set
CohereEmbeddings
as your embedding model onRAGApplicationBuilder
await new RAGApplicationBuilder()
.setEmbeddingModel(new CohereEmbeddings())
You can use your own custom embedding model by implementing the BaseEmbeddings
interface. Here's how that would look like -
class MyOwnEmbeddingImplementation implements BaseEmbeddings {
embedDocuments(texts: string[]): Promise<number[][]> {
throw new Error("Method not implemented.");
}
embedQuery(text: string): Promise<number[]> {
throw new Error("Method not implemented.");
}
getDimensions(): number {
throw new Error("Method not implemented.");
}
}
Once done, you can pass this class to the setEmbeddingModel
method like shown in the Cohere example above. That said, we really encourage you send in a PR to this library if you are implementing a famous or common embedding provider, so the community can benefit from it.
If you want us to add support for a specific embedding model, please create an issue and we will prioritize it. All PRs are welcome.
The library allows you to save your processed and unique embeddings with the vector databases of your choice. Here are the supported databases right now -
You can enable Pinecone storage by following these steps -
-
Create an account with Pinecone if you don't have one already. There is a good free tier.
-
Install pinecone package in your project
npm install @pinecone-database/pinecone
- Set the pinecone environment variable
PINECONE_API_KEY
. This can be obtained from the API Keys section on the Pinecone dashboard.
PINECONE_API_KEY=<your api key>
- Set the Pinecone database as your choice of
vectorDb
.setVectorDb(new PineconeDb({
projectName: 'test',
namespace: 'dev',
indexSpec: {
pod: {
podType: 'p1.x1',
environment: 'us-east1-gcp',
},
},
}))
Note: Pinecone supports serverless and pod based index deployments. You can control how you want your index created using the indexSpec attribute. This is mandatory to be provided but comes with full type specification. Read more about configuring this here.
MongoDB Atlas is a fully managed cloud database service built by the same team behind MongoDB. It offers high availability, scalability, and security for your database needs. Follow these steps to use MongoDB Atlas as your database for vector storage:
MongoDB Atlas provides a scalable and fully managed database solution for storing vectors and performing vector-based operations. With MongoDB Atlas, you can easily create, manage, and scale your database cluster to meet the demands of your application. Leveraging MongoDB's flexible document model, you can store vectors alongside other data types, making it a versatile solution for various use cases.
To use MongoDB Atlas as your vector database, you'll need:
- A MongoDB Atlas account. You can sign up here.
- A MongoDB Atlas cluster, preferably an M10 or higher deployment, to ensure optimal performance and features.
-
Set up MongoDB Atlas Cluster:
- Log in to your MongoDB Atlas account.
- Create a new project or select an existing one.
- Navigate to the Clusters section and click on "Build a New Cluster".
- Configure your cluster settings, including cloud provider, region, cluster tier (M10 or higher is recommended), and additional settings as needed.
- Once your cluster is created, wait for it to be deployed and become available.
-
Connect to your Cluster:
- After your cluster is deployed, create a new database and collection.
- Next click on "Connect" to get your connection string.
- Whitelist your IP address if necessary to allow connections to your cluster.
- Create a username and account to connect to your cluster
-
Install MongoDB Node.js Driver:
- In your Node.js project, install the MongoDB Node.js driver:
npm install mongodb
-
Configure MongoDB Atlas as Vector Database:
- Add the following to your .env
MONGODB_URI=mongodb+srv://username:[email protected]/database_name DB_NAME=my_database COLLECTION_NAME=my_collection
Chroma is an open source vector database. It's designed to be Python first and to connect to it from NodeJS, you will need to run Chroma in a container following the steps listed here. Once done, follow these steps to use Chroma as your vector database -
- Install Chroma package in your project
npm install chromadb
- Set Chroma database as your choice of
vectorDb
.setVectorDb(new ChromaDb({ url: 'http://localhost:8000' }))
Note: This is the default url and port if you run the Chroma docker container using the command docker-compose up -d --build
.
A warning -
In our testing, chroma performed the poorest in being able to retrieve relevant documents among the supported vector databases.
HNSWLib is an in-memory vectorstore. It is great for beginners to get started with since you do not need access to the file system or a cloud service. Follow these steps to use HNSWLib as your vector database -
- Install HNSWLib package in your project
npm install hnswlib-node
- Set HNSWLib database as your choice of
vectorDb
.setVectorDb(new HNSWDb())
Note: This is a purely in-memory vector store. All values are lost when application is restarted.
Weaviate is an open source vector store. You can deploy it locally on docker or use their managed cloud offering. Follow these steps to use Weaviate as your vector database -
- Install Weaviate package in your project
npm install weaviate-ts-client
- Set Weaviate database as your choice of
vectorDb
.setVectorDb(new WeaviateDb({ host: '...', apiKey: '...', className: '...' }))
You can pass along your vector database to the setVectorDb
method by implementing the interface BaseDb
. Here's how that would look like -
class MyOwnDb implements BaseDb {
async init({ dimensions }: { dimensions: number }): Promise<void> {
throw new Error('Method not implemented.');
}
async insertChunks(chunks: EmbeddedChunk[]): Promise<number> {
throw new Error('Method not implemented.');
}
async similaritySearch(query: number[], k: number): Promise<Chunk[]> {
throw new Error('Method not implemented.');
}
async getVectorCount(): Promise<number> {
throw new Error('Method not implemented.');
}
async deleteKeys(keys: string[]): Promise<void> {
throw new Error('Method not implemented.');
}
async reset(): Promise<void> {
throw new Error('Method not implemented.');
}
}
We really encourage you send in a PR to this library if you are implementing a famous or common database, so the community can benefit from it.
If you want to add support for any other vector database, please create an issue and we will add it to the list of supported databases. All PRs are welcome.
Caches serve to reduce re-processing embeddings, loaders and queries. There is no need to load, chunk and store a large PDF File or web page on every run. Caching smartly is built in and can be enabled out of the box simply by setting a cache processor using the method setCache
while building the RAGApplication
.
The library supports the following caches -
You can use LMDB to cache values locally on disk.
- Install LMDB package in your project
npm install lmdb
- Set
LmdbCache
as your cache provider onRAGApplicationBuilder
await new RAGApplicationBuilder()
.setCache(new LmdbCache({ path: path.resolve('./cache') }))
Note: The path
property will be used by the LMDB driver to create a folder housing the LMDB database files.
Here's the updated README section for using the MongoDB cache:
You can utilize MongoDB as a cache provider to store values in a MongoDB database.
- Install MongoDB package in your project
npm install mongodb
- Set
MongoCache
as your cache provider onRAGApplicationBuilder
import { MongoClient } from 'mongodb';
await new RAGApplicationBuilder()
.setCache(new MongoCache({
uri: MONGODB_URI,
dbName: DB_NAME,
collectionName: CACHE_COLLECTION_NAME
})
Note: You can use the same Mongo URI and Database as for your vectors, just with a different collection name.
You can use a simple in-memory cache to store values during testing.
- Set
MemoryCache
as your cache provider onRAGApplicationBuilder
await new RAGApplicationBuilder()
.setCache(new MemoryCache())
Note: Although this cache can remove duplicate loaders and chunks, its store does not persist between process restarts. You should only be using it for testing.
You can use redis as a cache to store values during testing.
- Set
RedisCache
as your cache provider onRAGApplicationBuilder
await new RAGApplicationBuilder()
.setCache(new RedisCache({ ... }))
Note: The library internally uses IORedis
to work with redis. RedisCache
constructor supports all IORedis
constructor parameters. Check IORedis
documentation for more detials.
You can pass along your own cache provider to the setCache
method by implementing the interface BaseCache
. Here's how that would look like -
class MyOwnCache implements BaseCache {
async init(): Promise<void> {
throw new Error("Method not implemented.");
}
async addLoader(loaderId: string, chunkCount: number ): Promise<void> {
throw new Error("Method not implemented.");
}
async getLoader(loaderId: string): Promise<{ chunkCount: number }> {
throw new Error("Method not implemented.");
}
async hasLoader(loaderId: string): Promise<boolean> {
throw new Error("Method not implemented.");
}
}
We really encourage you send in a PR to this library if you are implementing a famous or common cache provider, so the community can benefit from it.
If you want to add support for any other cache providers, please create an issue and we will add it to the list of supported caches. All PRs are welcome.
Langsmith allows you to keep track of how you use LLM and embedding models. It logs histories, token uses and other metadata. Follow these three simple steps to enable -
- Sign up for an account with Langsmith
- Generate an API Key from your admin page
- Set the following environment keys in your project
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_PROJECT="<project name>"
export LANGCHAIN_API_KEY="<api key>"
Here's a list of projects / examples built with RagKit
Project | Description |
---|---|
nestjs-embedjs-template | A NestJS server side template |
nextjs-chatbot-template | A NextJS chat template - including a chat UI |
slack-bot | A slack bot that can answer questions based on learnt confluence spaces |
Want to contribute? That is as easy as forking and sending in a PR!