Skip to content

Latest commit

 

History

History
238 lines (187 loc) · 18 KB

CHANGELOG.md

File metadata and controls

238 lines (187 loc) · 18 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.8.0] - 2024-08-19

This release completely refactors the directory structure of the repository for a more seamless and intuitive developer journey. It also adds support to deploy latest accelerated embedding and reranking models across the cloud, data center, and workstation using NVIDIA NeMo Retriever NIM microservices.

Added

Changed

  • Major restructuring and reorganisation of the assets within the repository
    • Top level experimental directory has been renamed as community.
    • Top level RetrievalAugmentedGeneration directory has been renamed as just RAG.
    • The Docker Compose files inside top level deploy directory has been migrated to example-specific directories under RAG/examples. The vector database and on-prem NIM microservices deployment files are under RAG/examples/local_deploy.
    • Top level models has been renamed to finetuning.
    • Top level notebooks directory has been moved to under RAG/notebooks and has been organised framework wise.
    • Top level tools directory has been migrated to RAG/tools.
    • Top level integrations directory has been moved into RAG/src.
    • RetreivalAugmentedGeneration/common is now residing under RAG/src/chain_server.
    • RetreivalAugmentedGeneration/frontend is now residing under RAG/src/rag_playground/default.
    • 5 mins RAG No GPU example under top level examples directory, is now under community.

Deprecated

[0.7.0] - 2024-06-18

This release switches all examples to use cloud hosted GPU accelerated LLM and embedding models from Nvidia API Catalog as default. It also deprecates support to deploy on-prem models using NeMo Inference Framework Container and adds support to deploy accelerated generative AI models across the cloud, data center, and workstation using latest Nvidia NIM-LLM.

Added

Changed

  • All examples now use llama3 models from Nvidia API Catalog as default. Summary of updated examples and the model it uses is available here.
  • Switched default embedding model of all examples to Snowflake arctic-embed-I model
  • Added more verbose logs and support to configure log level for chain server using LOG_LEVEL enviroment variable.
  • Bumped up version of langchain-nvidia-ai-endpoints, sentence-transformers package and milvus containers
  • Updated base containers to use ubuntu 22.04 image nvcr.io/nvidia/base/ubuntu:22.04_20240212
  • Added llama-index-readers-file as dependency to avoid runtime package installation within chain server.

Deprecated

[0.6.0] - 2024-05-07

Added

Changed

  • Renamed example csv_rag to structured_data_rag
  • Model Engine name update
    • nv-ai-foundation and nv-api-catalog llm engine are renamed to nvidia-ai-endpoints
    • nv-ai-foundation embedding engine is renamed to nvidia-ai-endpoints
  • Embedding model update
    • developer_rag example uses UAE-Large-V1 embedding model.
    • Using ai-embed-qa-4 for api catalog examples instead of nvolveqa_40k as embedding model
  • Ingested data now persists across multiple sessions.
  • Updated langchain-nvidia-endpoints to version 0.0.11, enabling support for models like llama3.
  • File extension based validation to throw error for unsupported files.
  • The default output token length in the UI has been increased from 250 to 1024 for more comprehensive responses.
  • Stricter chain-server API validation support to enhance API security
  • Updated version of llama-index, pymilvus.
  • Updated pgvector container to pgvector/pgvector:pg16
  • LLM Model Updates

[0.5.0] - 2024-03-19

This release adds new dedicated RAG examples showcasing state of the art usecases, switches to the latest API catalog endpoints from NVIDIA and also refactors the API interface of chain-server. This release also improves the developer experience by adding github pages based documentation and streamlining the example deployment flow using dedicated compose files.

Added

Changed

  • Switched from NVIDIA AI Foundation to NVIDIA API Catalog endpoints for accessing cloud hosted LLM models.
  • Refactored API schema of chain-server component to support runtime allocation of llm parameters like temperature, max tokens, chat history etc.
  • Renamed llm-playground service in compose files to rag-playground.
  • Switched base containers for all components to ubuntu instead of pytorch and optimized container build time as well as container size.
  • Deprecated yaml based configuration to avoid confusion, all configurations are now environment variable based.
  • Removed requirement of hardcoding NVIDIA_API_KEY in compose.env file.
  • Upgraded all python dependencies for chain-server and rag-playground services.

Fixed

  • Fixed a bug causing hallucinated answer when retriever fails to return any documents.
  • Fixed some accuracy issues for all the examples.

[0.4.0] - 2024-02-23

Added

  • New dedicated notebooks showcasing usage of cloud based Nvidia AI Playground based models using Langchain connectors as well as local model deployment using Huggingface.
  • Upgraded milvus container version to enable GPU accelerated vector search.
  • Added support to interact with models behind NeMo Inference Microservices using new model engines nemo-embed and nemo-infer.
  • Added support to provide example specific collection name for vector databases using an environment variable named COLLECTION_NAME.
  • Added faiss as a generic vector database solution behind utils.py.

Changed

  • Upgraded and changed base containers for all components to pytorch 23.12-py3.
  • Added langchain specific vector database connector in utils.py.
  • Changed speech support to use single channel for Riva ASR and TTS.
  • Changed get_llm utility in utils.py to return Langchain wrapper instead of Llmaindex wrappers.

Fixed

  • Fixed a bug causing empty rating in evaluation notebook
  • Fixed document search implementation of query decomposition example.

[0.3.0] - 2024-01-22

Added

Changed

  • Upgraded Langchain and llamaindex dependencies for all container.
  • Restructured README files for better intuitiveness.
  • Added provision to plug in multiple examples using a common base class.
  • Changed minio service's port to 9010from 9000 in docker based deployment.
  • Moved evaluation directory from top level to under tools and created a dedicated compose file.
  • Added an experimental directory for plugging in experimental features.
  • Modified notebooks to use TRTLLM and Nvidia AI foundation based connectors from langchain.
  • Changed ai-playground model engine name to nv-ai-foundation in configurations.

Fixed

[0.2.0] - 2023-12-15

Added

Changed

  • Repository restructing to allow better open source contributions
  • Upgraded dependencies for chain server container
  • Upgraded NeMo Inference Framework container version, no seperate sign up needed for access.
  • Main README now provides more details.
  • Documentation improvements.
  • Better error handling and reporting mechanism for corner cases
  • Renamed triton-inference-server container to llm-inference-server

Fixed