Skip to content

Commit

Permalink
Add RAG capabilities (#58)
Browse files Browse the repository at this point in the history
* install qdrant in backend

* Add qdrant service in docker-compose

* create upload table and uploads routes

* Add uploads service in frontend

* Fix missing uploads relationship in Member model

* Add uploads page and its components. Update sidebar and navbar to show uploads tab

* Update docker compose and env example to require qdrant api key

* Install pymupdf4llm in backend

* Show loading button if mutation is running

* Add QdrantStore class to create, delete and search qdrant vector store

* Add upload routes

* Add UploadsService in frontend

* Add required rule to mandatory form fields using controllers

* Add chunk size and chunk overlap form inputs to AddUpload and EditUpload component.

* Change chunk size and chunk overlap to 500 and 50 respectively

* Added functionality to update and retrieve a member's accessible uploads in the API. Updated the `MemberUpdate` and `MemberOut` models to include the `uploads` attribute.

* Introduced a `retriever` method in `QdrantStore` to create a `VectorStoreRetriever` instance

* Update graph logic to utilise uploads

* Add description col to uploads model and add migration script. Update create_upload and update_upload route to require description in request body

* Delete path attribute from UploadOut

* Fix create_upload by committing upload to db first to get upload id. Simplify error handling for create_upload and update_upload routes.

* Update uploads and members clients

* Add description field in AddUpload and EditUpload component

* Add uploads field in EditMember component

* Remove debug log and rename Uploads label to 'Knowledge Base'

* Change 'Skill' tag to 'Action' and its colorScheme to purple

* Add upload description to GraphUpload. Fix type issues.

* Validate name pattern during adding and editing upload

* Add migration file to recreate foreign key constraints with on delete cascade for members-skills and members-uploads. Align relationship in models.

* Refactor and add upload cleanup in setup fixture and modify its scope to module for isolation. Insert superuser at id=1 for consistent initial state.

* Add upload tests

* Remove unused import

* Add RAG info in readme

* Fix uploads test by mocking OpenAIEmbeddings so api_key not required

* Move OpenAIEmbeddings into methods to fix upload tests

* Install fastembed, set python version range so fastembed can be installed

* Switch from openai embeddings to fastembed

* Allow customisation of embedding model

* Add qdrant dashboard url in local deployment readme

* Add customising embedding model info in readme

* Fix create_upload to remove entry from uploads table if upload to vector store fails

* Allow customisation of max upload size

* Remove cascade delete to fix inadvertent deletions

* Replace RecursiveCharacterTextSplitter with  MarkdownTextSplitter
  • Loading branch information
StreetLamb authored Jun 25, 2024
1 parent 3a87ec1 commit 2d7422e
Show file tree
Hide file tree
Showing 52 changed files with 2,541 additions and 43 deletions.
8 changes: 8 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,16 @@ SECRET_KEY=changethis
FIRST_SUPERUSER=[email protected]
FIRST_SUPERUSER_PASSWORD=changethis
USERS_OPEN_REGISTRATION=False
MAX_UPLOAD_SIZE=50_000_000

# llm provider keys. Add only to models that you want to use
OPENAI_API_KEY=
ANTHROPIC_API_KEY=

# Embedding model. See the list of supported models: https://qdrant.github.io/fastembed/examples/Supported_Models/
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5


# Langsmith: For llm observability
LANGCHAIN_TRACING_V2=
LANGCHAIN_API_KEY=
Expand Down Expand Up @@ -52,3 +57,6 @@ SENTRY_DSN=
# Configure these with your own Docker registry images
DOCKER_IMAGE_BACKEND=backend
DOCKER_IMAGE_FRONTEND=frontend

# Qdrant
QDRANT__SERVICE__API_KEY=changethis
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@
- [Skills](#skills)
- [Create a Skill Using Skill Definitions](#create-a-skill-using-skill-definitions)
- [Writing a Custom Skill using LangChain](#writing-a-custom-skill-using-langchain)
- [Retrieval Augmented Generation (RAG)](#retrieval-augmented-generation-rag)
- [Customising embedding models](#customising-embedding-models)
- [Guides](#guides)
- [Creating Your First Hierarchical Team](#creating-your-first-hierarchical-team)
- [Equipping Your Team Member with Skills](#equipping-your-team-member-with-skills)
Expand Down Expand Up @@ -51,6 +53,7 @@ and many many more!
- **Persistent conversations**: Save and maintain chat histories, allowing you to continue conversations.
- **Observability**: Monitor and track your agents’ performance and outputs in real-time using LangSmith to ensure they operate efficiently.
- **Tool Calling**: Enable your agents to utilize external tools and APIs.
- **Retrieval Augmented Generation**: Enable your agents to reason with your internal knowledge base.
- **Human-In-The-Loop**: Enable human approval before tool calling.
- **Easy Deployment**: Deploy Tribe effortlessly using Docker.
- **Multi-Tenancy**: Manage and support multiple users and teams.
Expand Down Expand Up @@ -173,6 +176,22 @@ For more intricate tasks that extend beyond simple HTTP requests, LangChain allo

After creating a new tool, restart the application to ensure the tool is properly loaded into the database. Likewise, if you need to remove a tool, simply delete it from the `managed_skills` dictionary and restart the application to ensure it is removed from the database. Do note that tools created this way are available to all users in your application.

### Retrieval Augmented Generation (RAG)

RAG is a technique for augmenting your agents' knowledge with additional data. Agents can reason about a wide range of topics, but their knowledge is limited to public data up to the point in time they were trained on. If you want your agents to reason about private data, Tribe allows you to upload your data and select which data to include in your agent’s knowledge base. This enables your agents to reason with the selected data and allows you to create different agents with specialized knowledge.

#### Customising embedding models

By default, Tribe uses `BAAI/bge-small-en-v1.5`, which is a light and fast English embedding model that is better than `OpenAI Ada-002`. If your documents are multilingual or require image embedding, you may want to use another embedding model. You can easily do this by changing `EMBEDDING_MODEL` in your `.env` file:

```bash
# See the list of supported models: https://qdrant.github.io/fastembed/examples/Supported_Models/
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5 # Change this
```

> [!WARNING]
> If your existing and new embedding models have different vector dimensions, you may need to recreate your Qdrant collection. You can delete the collection through the Qdrant Dashboard at [http://localhost:6333/dashboard#/collections](http://localhost:6333/dashboard#/collections). Therefore, it is better to plan ahead which embedding model is most suitable for your workflows.
### Guides

#### Creating Your First Hierarchical Team
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""cascade delete junction table between members-skills and members-uploads
Revision ID: 45e43cb617f2
Revises: bfb17969c4ed
Create Date: 2024-06-23 16:08:18.903068
"""
from alembic import op
import sqlalchemy as sa
import sqlmodel.sql.sqltypes


# revision identifiers, used by Alembic.
revision = '45e43cb617f2'
down_revision = 'bfb17969c4ed'
branch_labels = None
depends_on = None


def upgrade():
# Drop existing foreign key constraints
op.drop_constraint('memberskillslink_member_id_fkey', 'memberskillslink', type_='foreignkey')
op.drop_constraint('memberskillslink_skill_id_fkey', 'memberskillslink', type_='foreignkey')
op.drop_constraint('memberuploadslink_member_id_fkey', 'memberuploadslink', type_='foreignkey')
op.drop_constraint('memberuploadslink_upload_id_fkey', 'memberuploadslink', type_='foreignkey')

# Create new foreign key constraints with ON DELETE CASCADE
op.create_foreign_key(
'memberskillslink_member_id_fkey', 'memberskillslink', 'member', ['member_id'], ['id'], ondelete='CASCADE')
op.create_foreign_key(
'memberskillslink_skill_id_fkey', 'memberskillslink', 'skill', ['skill_id'], ['id'], ondelete='CASCADE')
op.create_foreign_key(
'memberuploadslink_member_id_fkey', 'memberuploadslink', 'member', ['member_id'], ['id'], ondelete='CASCADE')
op.create_foreign_key(
'memberuploadslink_upload_id_fkey', 'memberuploadslink', 'upload', ['upload_id'], ['id'], ondelete='CASCADE')


def downgrade():
# Drop the foreign key constraints with ON DELETE CASCADE
op.drop_constraint('memberskillslink_member_id_fkey', 'memberskillslink', type_='foreignkey')
op.drop_constraint('memberskillslink_skill_id_fkey', 'memberskillslink', type_='foreignkey')
op.drop_constraint('memberuploadslink_member_id_fkey', 'memberuploadslink', type_='foreignkey')
op.drop_constraint('memberuploadslink_upload_id_fkey', 'memberuploadslink', type_='foreignkey')

# Recreate the original foreign key constraints without ON DELETE CASCADE
op.create_foreign_key(
'memberskillslink_member_id_fkey', 'memberskillslink', 'member', ['member_id'], ['id'])
op.create_foreign_key(
'memberskillslink_skill_id_fkey', 'memberskillslink', 'skill', ['skill_id'], ['id'])
op.create_foreign_key(
'memberuploadslink_member_id_fkey', 'memberuploadslink', 'member', ['member_id'], ['id'])
op.create_foreign_key(
'memberuploadslink_upload_id_fkey', 'memberuploadslink', 'upload', ['upload_id'], ['id'])
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
"""Add description col in uploads table
Revision ID: bfb17969c4ed
Revises: c3dc42618662
Create Date: 2024-06-22 15:57:45.897306
"""
from alembic import op
import sqlalchemy as sa
import sqlmodel.sql.sqltypes


# revision identifiers, used by Alembic.
revision = 'bfb17969c4ed'
down_revision = 'c3dc42618662'
branch_labels = None
depends_on = None


def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.add_column('upload', sa.Column('description', sqlmodel.sql.sqltypes.AutoString(), nullable=True))
op.execute('UPDATE upload SET description = name WHERE description IS NULL')
op.alter_column('upload', 'description', nullable=False)
op.drop_column('upload', 'path')
# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.add_column('upload', sa.Column('path', sa.VARCHAR(), autoincrement=False, nullable=False))
op.drop_column('upload', 'description')
# ### end Alembic commands ###
45 changes: 45 additions & 0 deletions backend/app/alembic/versions/c3dc42618662_add_uploads_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
"""add uploads table
Revision ID: c3dc42618662
Revises: c1acf65d4731
Create Date: 2024-06-19 14:03:47.288367
"""
from alembic import op
import sqlalchemy as sa
import sqlmodel.sql.sqltypes


# revision identifiers, used by Alembic.
revision = 'c3dc42618662'
down_revision = 'c1acf65d4731'
branch_labels = None
depends_on = None


def upgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.create_table('upload',
sa.Column('name', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
sa.Column('id', sa.Integer(), nullable=False),
sa.Column('path', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
sa.Column('owner_id', sa.Integer(), nullable=False),
sa.Column('last_modified', sa.DateTime(), nullable=False),
sa.ForeignKeyConstraint(['owner_id'], ['user.id'], ),
sa.PrimaryKeyConstraint('id')
)
op.create_table('memberuploadslink',
sa.Column('member_id', sa.Integer(), nullable=False),
sa.Column('upload_id', sa.Integer(), nullable=False),
sa.ForeignKeyConstraint(['member_id'], ['member.id'], ),
sa.ForeignKeyConstraint(['upload_id'], ['upload.id'], ),
sa.PrimaryKeyConstraint('member_id', 'upload_id')
)
# ### end Alembic commands ###


def downgrade():
# ### commands auto generated by Alembic - please adjust! ###
op.drop_table('memberuploadslink')
op.drop_table('upload')
# ### end Alembic commands ###
3 changes: 2 additions & 1 deletion backend/app/api/main.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from fastapi import APIRouter

from app.api.routes import login, members, skills, teams, threads, users, utils
from app.api.routes import login, members, skills, teams, threads, uploads, users, utils

api_router = APIRouter()
api_router.include_router(login.router, tags=["login"])
Expand All @@ -14,3 +14,4 @@
api_router.include_router(
threads.router, prefix="/teams/{team_id}/threads", tags=["threads"]
)
api_router.include_router(uploads.router, prefix="/uploads", tags=["uploads"])
9 changes: 9 additions & 0 deletions backend/app/api/routes/members.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
Message,
Skill,
Team,
Upload,
)

router = APIRouter()
Expand Down Expand Up @@ -194,6 +195,14 @@ def update_member(
skills = session.exec(select(Skill).where(col(Skill.id).in_(skill_ids))).all()
member.skills = list(skills)

# update member's accessible uploads if required
if member_in.uploads is not None:
upload_ids = [upload.id for upload in member_in.uploads]
uploads = session.exec(
select(Upload).where(col(Upload.id).in_(upload_ids))
).all()
member.uploads = list(uploads)

update_dict = member_in.model_dump(exclude_unset=True)
member.sqlmodel_update(update_dict)
session.add(member)
Expand Down
3 changes: 2 additions & 1 deletion backend/app/api/routes/teams.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,10 +196,11 @@ async def stream(
status_code=400, detail="Thread does not belong to the team"
)

# Populate the skills for each member
# Populate the skills and accessible uploads for each member
members = team.members
for member in members:
member.skills = member.skills
member.uploads = member.uploads

return StreamingResponse(
generator(
Expand Down
Loading

0 comments on commit 2d7422e

Please sign in to comment.