Part of gpt-rag
Here are the steps to configure cognitive search and deploy ingestion code using the terminal.
First Check your environment meets the requirements
- You need AZ CLI to log and run Azure commands in the command line.
- Python 3.9+ to run the setup script. Ideally use Python 3.10 (the same version used by the Function runtime).
- Azure Functions Core Tools will be needeed to deploy the chunking function.
1) Login to Azure
run az login
to log into azure. Run az login -i
if using a VM with managed identity to run the setup.
2) Clone the repo
If you plan to customize the ingestion logic, create a new repo by clicking on the Use this template button on top of this page.
Clone the repostory locally: git clone https://github.com/azure/gpt-rag-ingestion
If you created a new repository please update the repository URL before running the command
3) Deploy function to Azure
Enter in the cloned repo folder: cd gpt-rag-ingestion
Use Azure Functions Core Tools to deploy the function: func azure functionapp publish FUNCTION_APP_NAME --python
Replace FUNCTION_APP_NAME with your Ingestion Function App name before running the command
4) Run Azure Cognitive Search Setup
Enter in the cloned repo folder: cd gpt-rag-ingestion
Install python libraries: pip install -r requirements.txt --use-deprecated=legacy-resolver
Run the setup script: python setup.py -s SUBSCRIPTION_ID -r RESOURCE_GROUP -f FUNCTION_APP_NAME
Replace SUBSCRIPTION_ID, RESOURCE_GROUP and FUNCTION_APP_NAME by the names applicable to your environment
5) Add source documents to object storage
Upload your documents to the documents folder in the storage account which name starts with strag.
Cognitive Search Enrichment Pipeline
Azure Open AI Embeddings Generator
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.