Feature issue : #1706
Write a design proposals for Gen AI data ingestion workflow using :
- Gitlab pipeline as data ingestion scheduler
- OpenSearch as vector DB provider
- AWS lambda to run ingestion script with access to the database
- AWS for infrastructure (this design may include GCP GKE reflexion also)
- Langfuse as test dataset storage solution
- Reuse as much as possible existing python tooling : tock-llm-indexing-tools
- Optional Ragas for evaluators
Design should be reviewed and approved before starting any development to be sure that we are developing in the right direction.