Skip to content

Generative AI Infrastructure v0.9 Release Notes

Compare
Choose a tag to compare
@kevinintel kevinintel released this 27 Aug 03:11
· 115 commits to main since this release

OPEA Release Notes v0.9

What’s New in OPEA v0.9

  • Broaden functionality

    • Provide telemetry functionalities for metrics and tracing using Prometheus, Grafana, and Jaeger
    • Initialize two Agent examples: AgentQnA and DocIndexRetriever
    • Support for authentication and authorization
    • Add Nginx Component to strengthen backend security
    • Provide Toxicity Detection Microservice
    • Support the experimental Fine-tuning microservice
  • Enhancement

    • Align the Microservice format with the standards of OpenAI (Chat Completions, Fine-tuning... etc)
    • Enhance the performance benchmarking and evaluation for GenAI Examples, ex: TGI, resource allocation, ...etc
    • Enable support for launching container images as a non-root user
    • Use Llama-Guard-2-8B as default Guardrails model and bge-large-zh-v1.5 as default embedding model, mistral-7b-grok as default CodeTrans model
    • Add ProductivitySuite to provide access management and maintains user context
  • Deployment

    • Support Red Hat OpenShift Container Platform (RHOCP)
    • GenAI Microservices Connector (GMC) successfully tested on Nvidia GPUs
    • Add Kubernetes support for AudioQnA and VisualQnA examples
  • OPEA Docker Hub: https://hub.docker.com/u/opea

  • GitHub IO: https://opea-project.github.io/latest/index.html

  • Thanks for the external contribution from Sharan Shirodkar, Aishwarya Ramasethu
    , Michal Nicpon and Jacob Mansdorfer

Details

GenAIExamples
  • ChatQnA

    • Update port in set_env.sh(040d2b7)
    • Fix minor issue in ChatQnA Gaudi docker README(a5ed223)
    • update chatqna dataprep-redis port(02a1536)
    • Add support for .md file in file upload in the chatqna-ui(7a67298)
    • Added the ChatQnA delete feature, and updated the corresponding README(09a3196)
    • fixed ISSUE-528(45cf553)
    • Fix vLLM and vLLM-on-Ray UT bug(cfcac3f)
    • set OLLAMA_MODEL env to docker container(c297155)
    • Update guardrail docker file path(06c4484)
    • remove ray serve(c71bc68)
    • Refine docker_compose for dataprep param settings(3913c7b)
    • fix chatqna guardrails(db2d2bd)
    • Support ChatQnA pipeline without rerank microservice(a54ffd2)
    • Update the number of microservice replicas for OPEA v0.9(e6b4fff)
    • Update set_env.sh(9657f7b)
    • add env for chatqna vllm(f78aa9e)
  • Deployment

    • update manifests for v0.9(ba78b4c)
    • Update K8S manifest for ChatQnA/CodeGen/CodeTrans/DocSum(01c1b75)
    • Update benchmark manifest to fix errors(4fd3517)
    • Update env for manifest(4fa37e7)
    • update manifests for v0.9(08f57fa)
    • Add AudioQnA example via GMC(c86cf85)
    • add k8s support for audioqna(0a6bad0)
    • Update mainifest for FaqGen(80e3e2a)
    • Add kubernetes support for VisualQnA(4f7fc39)
    • Add dataprep microservice to chatQnA example and the e2e test(1c23d87)
  • Documentation

    • [doc] Update README.md(c73e4e0)
    • doc fix: Update README.md to remove specific dicscription of paragraph-1(5a9c109)
    • doc: fix markdown in docker_image_list.md(9277fe6)
    • doc: fix markdown in Translation/README.md(d645305)
    • doc: fix markdown in SearchQnA/README.md(c461b60)
    • doc: fix FaqGen/README.md markdown(704ec92)
    • doc: fix markdown in DocSum/README.md(83712b9)
    • doc: fix markdown in CodeTrans/README.md(076bca3)
    • doc: fix CodeGen/README.md markdown(33f8329)
    • doc: fix markdown in ChatQnA/README.md(015a2b1)
    • doc: fix headings in markdown files(21fab71)
    • doc: missed an H1 in the middle of a doc(4259240)
    • doc: remove use of HTML for table in README(e81e0e5)
    • Update ChatQnA readme with OpenShift instructions(ed48371)
    • Convert HTML to markdown format.(14621f8)
    • Fix typo {your_ip} to {host_ip}(ad8ca88)
    • README fix typo(abc02e1)
    • fix script issues in MD file(acdd712)
    • Minor documentation improvements in the CodeGen README(17b9676)
    • Refine Main README(08eb269)
    • [Doc]Add a micro/mega service WorkFlow for DocSum(343d614)
    • Update README for k8s deployment(fbb81b6)
  • Other examples

    • Clean deprecated VisualQnA code(87617e7)
    • Using TGI official release docker image for intel cpu(b2771ad)
    • Add VisualQnA UI(923cf69)
    • fix container name(5ac77f7)
    • Add VisualQnA docker for both Gaudi and Xeon using TGI serving(2390920)
    • Remove LangSmith from Examples(88eeb0d)
    • Modify the language variable to match language highlight.(f08d411)
    • Remove deprecated folder.(7dd9952)
    • update env for manifest(4fa37e7)
    • AgentQnA example(67df280)
    • fix tgi xeon tag(6674832)
    • Add new DocIndexRetriever example(566cf93)
    • Add env params for chatqna xeon test(5d3950)
    • ProductivitySuite Combo Application with REACT UI and Keycloak Authen(947cbe3)
    • change codegen tgi model(06cb308)
    • change searchqna prompt(acbaaf8)
    • minor fix mismatched hf token(ac324a9)
    • fix translation gaudi env(4f3be23)
    • Minor fixes for CodeGen Xeon and Gaudi Kubernetes codegen.yaml (c25063f)
  • CI/CD/UT

    • update deploy_gmc logical in cd workflow(c016d82)
    • fix ghcr.io/huggingface/text-generation-inference tag(503a1a9)
    • Add GMC e2e in CD workflow(f45e4c6)
    • Fix CI test changed file detect issue(5dcadf3)
    • update cd workflow name(3363a37)
    • Change microservice tags in CD workflow(71363a6)
    • Fix manual freeze images workflow(c327972)
    • open chatqna guardrails test(db2d2bd)
    • Add gmc build, scan and deploy workflow(a39f23a)
    • Enhance CI/CD infrastructure(c26d0f6)
    • Fix typo in CI workflow(e12baca)
    • Fix ChatQnA Qdrant CI issues(e71aba0)
    • remove continue-on-error: true to stop the test when image build failed(6296e9f)
    • Fix CD workflow typos(039014f)
    • Freeze base images(c9f9aca)
    • support multiple test cases for ChatQnA(939502d)
    • set action back to pull_request_target(1c07a38)
    • Add BoM collect workflow and image publish workflow(e93146b)
    • Fix left issues in CI/CD structure refactor(a6385bc)
    • Add composable manifest e2e test for cd workflow(d68be05)
    • Add secrets for CI test(3c9e2aa)
    • Build up docker images CD workflow(8c384e0)
    • fix corner issue in CI test(64bfea9)
    • Rename github workflow files(ebc165a)
    • Improve manifest chaqna test(a072441)
    • Refactor build image workflows with common action.yml(e22d413)
    • Automatic create issue to GenAIInfra when docker compose files changed(8bdb598)
    • Add components owner(ab98795)
    • Fix code scan warning(ac89855)
    • Check url of docker image list.(cf021ee)
    • change namespace surfix to random string (46af6f3)
    • chatqna k8s manifest: Fixed retriever-redis v0.9 image issue(7719755)
    • Adding Trivy and SBOM actions(f3ffcd5)
    • optimize CI log format(dfaf479)
GenAIComps
  • Cores

    • Refine parameter in api_protocol.py(0584b45)
    • Revert the default value of max_new_tokens to 1024(f2497c5)
    • Fixed Orchestrator schedule method(76877c1)
    • fix wrong indent(9b0edf2)
    • Allow downstream of streaming nodes(90e367e)
    • Add Retrieval gateway in core to support IndexRetrivel Megaservice(56daf95)
    • add telemetry doc(2a2a93)
  • LLM/embedding/reranking/retrieval

    • Using habana docker 1.16.1 everywhere(5deb383)
    • adding entrypoint.sh to faq-generation comp (4a7b8f4)
    • Fix image in docker compose yaml to use the built docker image tag from the README(72a2553)
    • Refine LLM Native Microservice(b16b14a)
    • Fix Retriever qdrant issue(7aee7e4)
    • Change /root/ to /home/user/.(4a67d42)
    • Fix embeddings_langchain-mosec issue.(87905ad)
    • fix HuggingFaceEmbedding deprecated in favor of HuggingFaceInferenceAPIEmbedding(2891cc6)
    • align vllm-ray response format to tgi response format(ac4a777)
    • build new images for llms(ed99d47)
    • LLM micro service input data does not have input model name(761f7e0)
    • Fix OpenVINO vLLM build scripts and update unit test case(91d825c)
    • Refine the instructions to run the retriever example with qdrant(eb51018)
    • Add cmds to restart ollama service and add proxy settings while launching docker(8eb8b6a)
    • Vllm and vllm-ray bug fix (add opea for vllm, update setuptools version)(0614fc2)
    • remove deprecated langchain imports and switch to langchain-huggingface(055404a)
    • [Enhence] Increase mosec_embedding forward timeout to support high concurrency cases(b61f61b)
    • Fix issues in updating embedding & reranking model to bge-large-zh-v1.5(da19c5d)
    • refact embedding/ranking/llm request/response by referring to openai format(7287caa)
    • align VLLM micro-service output format with UI(c1887ed)
    • fix vllm docker command(c1a5883)
    • Update Embedding Mosec Dockerfile to use BAAI/bge-large-zh-v1.5(bbdc1f0)
    • remove length limitation of embedding(edcd1e8)
    • Support SearchedDoc input type in LLM for No Rerank Pipeline (3c29fb4)
    • Add local_embedding return 768 length to align with chatqna example(a234db)
    • Refine LLM for No Rerank(fe8ef3)
    • Remove redundant dependency from 'vllm-ray' comps(068527d)
  • LVM/TTS/ASR

    • Revise TTS, SpeechT5Model to end the last audio chunk at the correct punctuation mark location(20fc8ca)
    • Support llava-next using TGI(e156101)
    • whisper: Fix container build failure(d5b8cdf)
    • support whisper long-form generation (daec680)
    • Support multiple image sources for LVM microservice(ed776ac)
    • fix ffmpeg build on hpu(ac3909d)
    • Support streaming output for LVM microservice(c5a0344)
    • Add video-llama LVM microservice under lvms(db8c893)
    • add torchvision into requirements(1566047)
    • Use Gaudi base images from Dockerhub(33db504)
    • update the requirements.txt for tts and asr(5ba2561)
  • DataPrep

    • Fix Dataprep qdrant issues and add Test Script(a851abf)
    • Refine robustness of Dataprep Redis(04986c1)
    • Address testcase failure(075e84f)
    • Added support for Unified Port, GET/DELETE endpoints in pgvector Dataprep(8a62bac)
    • Update dataprep default mosec embedding model in config.py(8f0f2b0)
    • unify port in one microservice.(f8d45e5)
    • Pinecone update to OPEA(7c9f77b)
    • Refine Dataprep Code & UT(867e9d7)
    • Support delete for Milvus vector db in Dataprep(767a14c)
    • Redis-dataprep: Make Redis connection consistent(cfaf5f0)
    • Update Dataprep with Parameter Settings(55b457b)
    • Fix Dataprep Potential Error in get_file(04ff8bf)
    • Add dependency for pdf2image and OCR processing(9397522)
    • Fix the data load issue for structured files (40f1463)
    • Fix deps #568(c541d1d)
  • Other Components

    • Remove 'langsmith' per code review(dcf68a0)
    • Refine Nginx Component(69f9895)
    • Add logging for unified debug(fab1fbd)
    • Add Nginx Component for Service Forwarding(60cc0b0)
    • Fix line endings to LF(fecf4ac)
    • Add Assistant API for agent(f3a8935)
    • doc: remove use of unknown highlight language(5bd8bda)
    • Update README.md(b271739)
    • doc: fix multiple H1 headings(77e0e7b)
    • Add RagAgentDocGrader to agent comp(368c833)
    • Update Milvus docker-compose.yaml(d3eefea)
    • prompt_registry: Unifying API endpoint port(27a01ee)
    • Minor SPDX header update(4712545)
    • Modification to toxicity plugin PR (63650d0)
    • Optional container build instructions(be4833f)
    • Add Uvicorn dependency(b2e2b1a)
    • Support launch as Non-Root user in all published container images.(1eaf6b7)
    • Update readme and remove empty readme(a61e434)
    • Refine Guardrails README and update model(7749ce3)
    • Add codeowner(fb0ea3d)
    • Remove unnecessary langsmith dependency(cc8cd70)
    • doc: add .gitignore(d39fee9)
    • Add output evaluation for guardrails(62ca5bc)
    • Add ML detection strategy to PII detection guardrail(de27e6b)
    • Add finetuning list job, cancel job, retrieve finetuning job feature(7bbbdaf)
    • update finetuning api with openai format.(1ff81da)
    • Add finetuning component (ad0bb7c)
    • Add toxicity detection microservice(97fdf54)
    • fix searchqna readme(66cbbf3)
    • Fix typos and add definitions for toxicity detection microservice(9b8798a)
  • CI/CD/UT

    • Fix tts image build error(8b9dcdd)
    • Add CD workflow.(5dedd04)
    • Fix CI test changed file detect issue(cd83854)
    • add sudo in wf remove(1043336)
    • adapt GenAIExample test structure refine(7ffaf24)
    • Freeze base images(61dba72)
    • Fix image build check waring.(2b14c63)
    • Modify validate result check.(8a6079d)
    • Fix requirement actions(2207503)
    • Add validate result detection.(cf15b91)
    • Check build fail and change port 8008 to 5025/5026.(5159aac)
    • Freeze requirements(5d9a855)
    • Fix vllm-ray issue(0bd8215)
    • Standardize image build.(a56a847)
    • clean local images before test(f36629a)
    • update test files(ab8ebc4)
    • Fix validation failure without exit.(f46f1f3)
    • Update Microservice CI trigger path(3ffcff4)
    • Add E2E example test(ec4143e)
    • Added unified ports for Chat History Microservice.(2098b91)
    • add secrets for test(cafcf1b)
    • [tests] normalize embedding and reranking endpoint docker image name(e3f29c3)
    • fix asr ut on hpu(9580298)
    • update image build list(7185d6b)
    • Add path check for dockerfiles in compose.yaml and change workflow name.(c45f8f0)
    • enhance docker image build(75d6bc9)
    • refactor build image with common action.yml(ee5b0f6)
    • Fix '=' miss issues.(eb5cc8a)
    • fix freeze workflow(945b9e4)
GenAIEvals
  • remove useless code.(1004d5b)
  • Unify benchmark tool based on stresscli library(71637c0)
  • Fixed query list id out-of-range issue(7b719de)
  • Add GMC chatqna benchmark script(6a390da)
  • Add test example prompts for codegen(ebee50c)
  • doc: fix language on codeblock in README(85aef83)
  • Fix metrics issue of CRUD(82c1654)
  • Add benchmark stresscli scripts(9998cd7)
  • remove useless code(1004d5b)
  • Add GMC chatqna benchmark script(6a390da)
  • Fixed query list id out-of-range issue(7b719de)
  • enhance multihop dataset accuracy(dfc2c1e)
  • doc: add Kubernetes platform-optimization README(7600db4)
  • doc: fix platform optimization README based on PR#73 feedback(8c7eb1b)
  • update for faq benchmark(d754a84)
  • Support e2e and first token P90 statistics(b07cd12)
GenAIInfra
  • GMC

    • update GMC e2e and Doc(8a85364)
    • Fixed some bugs for GMC yaml files(112295a)
    • Set up CD workflow for GMC(3d94844)
    • GMC: Add GPU support for GMC.(119941e)
    • authN-authZ: add oauth2-proxy support for authentication and authorization together with GMC(488a1ca)
    • Output streaming support for the whole pipeline in GMC router(c412aa3)
    • re-org k8s manifests files for GMC and examples(d39b315)
    • GMC: resource management(81060ab)
    • Enable GMC helm installation test in CI(497ff61)
    • Add helm chart for deploying GMC itself(a76c90f)
    • Add multiple endpoints for GMC pipeline via gmcrouter(da4f091)
    • GMC: fix unsafe quoting(aa2730a)
    • fix: update doc for authN-authZ with oauth(54cd66f)
    • Troubleshooting guide for the validating webhook.(b47ec0c)
    • Fix router bugs on max_new_tokens and dataprep gaudi yaml file(5735dd3)
    • Add dataprep microservice to chatQnA example(d9a0271)
    • Troubleshooting guide for the validating webhook(b47ec0c)
    • Add HPA support to ChatQnA(cab7a88)
  • HelmChart

    • Add manual helm e2e test flow(3b5f62e)
    • Add script to generate manifests from helm charts(273cb1d)
    • ui: update chatqna helm chart readme and env name(a1d6d70)
    • Update helm chart readme(656dcc6)
    • helm: fix tei/tgi/docsum(a270726)
    • helm: update data-prep to latest changes(625899b)
    • helm: Update helm manifest to address user raised issues(4319660)
    • helm: Support local embedding(73b5b65)
    • ui: add helm chart/manifests for conversational UI(9dbe550)
    • helm: Add K8S probes to retriever-usvc(af47b3c)
    • Enable google secrets in helm chart e2e workflow(7079049)
    • Helm/Manifest: Add K8S probe(d3fc939)
    • Enable helm/common tests in CI(fa8ef35)
    • Helm: Add Nvidia GPU support for ChatQnA(868103b)
    • misc changes(b1182c4)
    • tgi: Update tgi version on xeon to latest-intel-cpu(c06bcea)
    • Fix typos in README(faa976b)
    • Support HF_ENDPOINT(cf28da4)
    • Set model-volume default to tmp volume(b5c14cd)
    • Enable using PV as model cache directory(c0d2ba6)
    • add manual helm e2e test flow(3b5f62e)
    • helm/manifest: Update to release v0.9(182183e)
  • Others

    • Rename workflows to get better readable(cb31d05)
    • Add manual job to freeze image tags and versions after code freeze(c0f5e2f)
    • tgi: revert xeon version to 2.2.0(076e81e)
    • Initial commit for Intel Gaudi Base Operator(c2a13d1)
    • Add AudioQnA example and e2e test(1b50b73)
    • Reorg and rename CI workflows to follow the rules(2bf648c)
    • Fix errors in ci workflow(779e526)
    • Add e2e test for chatqna with switch mode enable(7b20273)
    • Validating webhook implementation(df5f6f3)
    • Enhance manually run image build workflow(e983c32)
    • Add image build process on manual event(833dcec)
    • CI: change chart e2e to support tag replacing(739788a)
    • Add e2e test for chatQnA with dataprep microservice(c1fd27f)
    • Fix a bug of chart e2e workflow(86dd739)
    • Improve chart e2e test workflow and scripts(70205e5)
    • rename workflows to get better readable(cb31d05)
    • Correct TGI image tag for NV platform(629033b)
    • authN-authZ: change folder and split support(0c39b7b)
    • fix errors of manual helm workflow(bd46dfd)
    • update freeze tag manual workflow(c565909)
    • Update README(9480afc)
    • improve cd workflows and add release document (a4398b0)
    • Add some NVIDIA platform support docs and scripts(cad2fc3)