The search project I've always wanted to work on.
Deeno is a search engine project that I am just beginning to undertake. The goal of this project is to really just understand what it takes to build search - so I'm building everything from the UI, the microservices and the data pipelines from scratch.
When complete, I picture search functionality on the entire Wikipedia dataset powered by microservices and Spark jobs written from scratch.
This project is built to have three components:
- Web interface built with Angular 15.
- Microservices built with the Spring framework.
- Indexers built using Apache Spark that update a Redis cluster.
I work on this when I have time off classes and work, so it can get quiet here at times, but its one step at a time.
The plan:
- Deploy and get a simple inverted index based retrieval system on the cloud.
- Graduate to ranked retrieval.
- Move to vector space retrieval using deep learning representations.
- Integrate question answering ability using language models.
I'm at step 1 now, and once the infrastructure is up and running, things should really accelerate. Stay tuned!
I've now started configuring the infrastructure for this project as follows on Google Cloud:
So I can now build individual containers like so:
gcloud builds submit --tag [IMAGE] /Users/cksash/Documents/proj/search/api/flask-aisearch
Run them individually if I wish like so:
gcloud run deploy flask-aisearch --image [IMAGE]
And run the entire project in the correct order (defined by dependencies) like so:
gcloud run services replace service.yaml