|
Join Decoding ML for proven content on designing, coding, and deploying production-grade AI systems with software engineering and MLOps best practices to help you ship AI applications. Every week, straight to your inbox. |
Get up and running with our Amazon tabular semantic search engine in minutes.
Install these tools on your machine:
| Tool | Purpose | Version | Download Link | Notes |
|---|---|---|---|---|
| Python | Programming language runtime | = v3.11 | Download | Core runtime environment |
| uv | Python package installer and virtual environment manager | >= v0.4.30 | Download | Modern replacement for pip/venv/poetry |
| GNU Make | Build or task automation tool | >= v3.81 | Download | Used for running project commands |
| MongoDB Atlas CLI | Interact with MongoDB Atlas from the CLI | >= v1.33.0 | Download | Used for hosting the vector DB |
You'll need access to:
| Service | Purpose | Cost | Required Environment Variables | Setup Guide |
|---|---|---|---|---|
| OpenAI API | LLM API | Pay-per-use | OPENAI_API_KEYOPENAI_MODEL_ID |
Quick Start Guide |
| MongoDB Atlas | Vector DB | Free tier | USE_MONGO_VECTOR_DBMONGO_CLUSTER_URLMONGO_DATABASE_NAMEMONGO_CLUSTER_NAMEMONGO_PROJECT_IDMONGO_API_PUBLIC_KEYMONGO_API_PRIVATE_KEY |
1. Create a free MongoDB Atlas account 2. Create a Cluster 3. Add a Database User 4. Configure a Network Connection 5. Create an API Key 6. Create an empty database |
Note: Find all the required environment variables in the
.env.examplefile.
Set up the project environment by running the following:
make installTest that you have Python 3.11.8 installed in your new uv environment:
uv run python --version
# Output: Python 3.11.8This command will:
- Create a virtual environment using
uv - Activate the virtual environment
- Install all dependencies from
pyproject.toml
Note
Normally, uv will pick the right Python version mentioned in .python-version and install it automatically if it is not on your system. If you are having any issues, explicitly install the right Python version by running make install-python
Before running any components:
- Create your environment file:
cp .env.example .env
- Open
.envand configure the required credentials following the inline comments (see superlinked_app/config.py for all options).
Important
For quick testing, set USE_MONGO_VECTOR_DB=False to use an in-memory database, otherwise follow Step 3.
Follow these steps to set up MongoDB Atlas for scalable vector search and get all required environment variables.
Tip
If you are more comfortable with a UI, you can also follow the steps from 📋 Prerequisites -> Cloud Services -> MongoDB Atlas, which do the same thing.
- Create Account & Install CLI
📚 More on getting started with MongoDB Atlas
- Login to Atlas CLI
atlas auth login- Create Free Cluster
Create an M0 (free) cluster in AWS EU West region:
atlas clusters create free-cluster --provider AWS --region EU_WEST_1 --tier M0Wait for cluster creation to complete and list available clusters:
atlas clusters watch free-cluster
atlas clusters listSet MONGO_CLUSTER_NAME=free-cluster environment variable.
Important
The free M0 cluster has limitations but is sufficient for testing.
- Create Database User
Create database user:
atlas dbusers create --username <your_mongo_database_user> --password <your_mongo_database_password> --role readWriteAnyDatabaseList users:
atlas dbusers listThese credentials will be used in the MONGO_CLUSTER_URL env var.
- Configure Network Access
Option 1: Allow access from anywhere (ease of use for development):
atlas accessList create "0.0.0.0/0" --type ipAddress --comment "Allow access from anywhere"Option 2: Allow only your IP (recommended):
atlas accessList create --currentIpTo list current access list entries:
atlas accessList listImportant
For production, restrict network access to specific IPs.
- Create API Keys
Create API key with required permissions:
atlas organizations apiKeys create --desc "Full Access API Key for 'tabular-semantic-search' project" --role ORG_OWNER --role ORG_MEMBER --role ORG_GROUP_CREATOR --role ORG_READ_ONLYList keys to get the public key:
atlas organizations apiKeys listSet:
MONGO_API_PUBLIC_KEY: Public key from the created API keyMONGO_API_PRIVATE_KEY: Private key shown during creation (save it immediately)
Important
Save your API private key immediately after creation - it cannot be retrieved later.
- Setting Remaining Environment Variables
Set MONGO_PROJECT_ID:
atlas projects listSet MONGO_CLUSTER_URL:
bash
atlas clusters connectionStrings describe free-cluster
Now set the environment variables as (without mongodb+srv://) MONGO_CLUSTER_URL={YOUR_DATABASE_USER}:{YOUR_DATABASE_PASSWORD}@free-cluster.vhxy1.mongodb.net, where the database user and password are the ones created at point 4.
- Create Database
Create the database which is already specified in MONGO_DATABASE_NAME:
make create-mongodb-databaseImportant
If you are getting SSL handshake errors, turn off your VPN or firewall or try using a different network.
Now go to MongoDB Atlas, navigate to Clusters → Browse Collections to verify that your database was created successfully.
Your final .env file should have these MongoDB-related variables:
USE_MONGO_VECTOR_DB=True
MONGO_CLUSTER_URL=username:password@free-cluster.xxxxx.mongodb.net
MONGO_DATABASE_NAME=your_database_name
MONGO_CLUSTER_NAME=free-cluster
MONGO_PROJECT_ID=your_project_id
MONGO_API_PUBLIC_KEY=your_public_key
MONGO_API_PRIVATE_KEY=your_private_key- Final Thoughts
MongoDB Atlas can also be set up locally using Docker, but Superlinked isn't yet integrated with the local version: more on the local Mongo vector DB ←
Download and process the dataset sample:
make download-and-process-sample-datasetWe also support the complete dataset, but you need a powerful computer, good internet and patience to run everything on it:
make download-and-process-full-datasetYou should see this structure in your data folder:
data/
├── processed_100_sample.jsonl
├── processed_300_sample.jsonl
├── processed_850_sample.jsonl
├── sample.json
└── sample.json.gz
| Notebook | Description |
|---|---|
| Dataset exploration | Dive into the Amazon ESCI dataset |
| Tabular semantic search with natural language queries demo | See Superlinked in action |
| Text-to-SQL examples | Try LlamaIndex queries |
- Start it up:
make start-superlinked-serverFastAPI endpoints docs available at http://localhost:8080/docs
- From a different terminal, load your data:
make load-dataGo to MongoDB Atlas, navigate to Clusters → Browse Collections → tabular-semantic-search to verify that your vector database was populated successfully.
Note: Give it a few minutes before running the queries (~5 minutes)
- Try some queries:
make post-filter-query
make post-semantic-query
make similar-item-query- Start the Streamlit UI:
make start-uiAccessible at http://localhost:8501/
Important
If you are not getting any results when making queries from the CLI or Streamlit app, restart the Superlinked server.