Generative AI Infrastructure v0.9 Release Notes
OPEA Release Notes v0.9
What’s New in OPEA v0.9
-
Broaden functionality
- Provide telemetry functionalities for metrics and tracing using Prometheus, Grafana, and Jaeger
- Initialize two Agent examples: AgentQnA and DocIndexRetriever
- Support for authentication and authorization
- Add Nginx Component to strengthen backend security
- Provide Toxicity Detection Microservice
- Support the experimental Fine-tuning microservice
-
Enhancement
- Align the Microservice format with the standards of OpenAI (Chat Completions, Fine-tuning... etc)
- Enhance the performance benchmarking and evaluation for GenAI Examples, ex: TGI, resource allocation, ...etc
- Enable support for launching container images as a non-root user
- Use Llama-Guard-2-8B as default Guardrails model and bge-large-zh-v1.5 as default embedding model, mistral-7b-grok as default CodeTrans model
- Add ProductivitySuite to provide access management and maintains user context
-
Deployment
- Support Red Hat OpenShift Container Platform (RHOCP)
- GenAI Microservices Connector (GMC) successfully tested on Nvidia GPUs
- Add Kubernetes support for AudioQnA and VisualQnA examples
-
OPEA Docker Hub: https://hub.docker.com/u/opea
-
Thanks for the external contribution from Sharan Shirodkar, Aishwarya Ramasethu
, Michal Nicpon and Jacob Mansdorfer
Details
GenAIExamples
-
ChatQnA
- Update port in set_env.sh(040d2b7)
- Fix minor issue in ChatQnA Gaudi docker README(a5ed223)
- update chatqna dataprep-redis port(02a1536)
- Add support for .md file in file upload in the chatqna-ui(7a67298)
- Added the ChatQnA delete feature, and updated the corresponding README(09a3196)
- fixed ISSUE-528(45cf553)
- Fix vLLM and vLLM-on-Ray UT bug(cfcac3f)
- set OLLAMA_MODEL env to docker container(c297155)
- Update guardrail docker file path(06c4484)
- remove ray serve(c71bc68)
- Refine docker_compose for dataprep param settings(3913c7b)
- fix chatqna guardrails(db2d2bd)
- Support ChatQnA pipeline without rerank microservice(a54ffd2)
- Update the number of microservice replicas for OPEA v0.9(e6b4fff)
- Update set_env.sh(9657f7b)
- add env for chatqna vllm(f78aa9e)
-
Deployment
- update manifests for v0.9(ba78b4c)
- Update K8S manifest for ChatQnA/CodeGen/CodeTrans/DocSum(01c1b75)
- Update benchmark manifest to fix errors(4fd3517)
- Update env for manifest(4fa37e7)
- update manifests for v0.9(08f57fa)
- Add AudioQnA example via GMC(c86cf85)
- add k8s support for audioqna(0a6bad0)
- Update mainifest for FaqGen(80e3e2a)
- Add kubernetes support for VisualQnA(4f7fc39)
- Add dataprep microservice to chatQnA example and the e2e test(1c23d87)
-
Documentation
- [doc] Update README.md(c73e4e0)
- doc fix: Update README.md to remove specific dicscription of paragraph-1(5a9c109)
- doc: fix markdown in docker_image_list.md(9277fe6)
- doc: fix markdown in Translation/README.md(d645305)
- doc: fix markdown in SearchQnA/README.md(c461b60)
- doc: fix FaqGen/README.md markdown(704ec92)
- doc: fix markdown in DocSum/README.md(83712b9)
- doc: fix markdown in CodeTrans/README.md(076bca3)
- doc: fix CodeGen/README.md markdown(33f8329)
- doc: fix markdown in ChatQnA/README.md(015a2b1)
- doc: fix headings in markdown files(21fab71)
- doc: missed an H1 in the middle of a doc(4259240)
- doc: remove use of HTML for table in README(e81e0e5)
- Update ChatQnA readme with OpenShift instructions(ed48371)
- Convert HTML to markdown format.(14621f8)
- Fix typo {your_ip} to {host_ip}(ad8ca88)
- README fix typo(abc02e1)
- fix script issues in MD file(acdd712)
- Minor documentation improvements in the CodeGen README(17b9676)
- Refine Main README(08eb269)
- [Doc]Add a micro/mega service WorkFlow for DocSum(343d614)
- Update README for k8s deployment(fbb81b6)
-
Other examples
- Clean deprecated VisualQnA code(87617e7)
- Using TGI official release docker image for intel cpu(b2771ad)
- Add VisualQnA UI(923cf69)
- fix container name(5ac77f7)
- Add VisualQnA docker for both Gaudi and Xeon using TGI serving(2390920)
- Remove LangSmith from Examples(88eeb0d)
- Modify the language variable to match language highlight.(f08d411)
- Remove deprecated folder.(7dd9952)
- update env for manifest(4fa37e7)
- AgentQnA example(67df280)
- fix tgi xeon tag(6674832)
- Add new DocIndexRetriever example(566cf93)
- Add env params for chatqna xeon test(5d3950)
- ProductivitySuite Combo Application with REACT UI and Keycloak Authen(947cbe3)
- change codegen tgi model(06cb308)
- change searchqna prompt(acbaaf8)
- minor fix mismatched hf token(ac324a9)
- fix translation gaudi env(4f3be23)
- Minor fixes for CodeGen Xeon and Gaudi Kubernetes codegen.yaml (c25063f)
-
CI/CD/UT
- update deploy_gmc logical in cd workflow(c016d82)
- fix ghcr.io/huggingface/text-generation-inference tag(503a1a9)
- Add GMC e2e in CD workflow(f45e4c6)
- Fix CI test changed file detect issue(5dcadf3)
- update cd workflow name(3363a37)
- Change microservice tags in CD workflow(71363a6)
- Fix manual freeze images workflow(c327972)
- open chatqna guardrails test(db2d2bd)
- Add gmc build, scan and deploy workflow(a39f23a)
- Enhance CI/CD infrastructure(c26d0f6)
- Fix typo in CI workflow(e12baca)
- Fix ChatQnA Qdrant CI issues(e71aba0)
- remove continue-on-error: true to stop the test when image build failed(6296e9f)
- Fix CD workflow typos(039014f)
- Freeze base images(c9f9aca)
- support multiple test cases for ChatQnA(939502d)
- set action back to pull_request_target(1c07a38)
- Add BoM collect workflow and image publish workflow(e93146b)
- Fix left issues in CI/CD structure refactor(a6385bc)
- Add composable manifest e2e test for cd workflow(d68be05)
- Add secrets for CI test(3c9e2aa)
- Build up docker images CD workflow(8c384e0)
- fix corner issue in CI test(64bfea9)
- Rename github workflow files(ebc165a)
- Improve manifest chaqna test(a072441)
- Refactor build image workflows with common action.yml(e22d413)
- Automatic create issue to GenAIInfra when docker compose files changed(8bdb598)
- Add components owner(ab98795)
- Fix code scan warning(ac89855)
- Check url of docker image list.(cf021ee)
- change namespace surfix to random string (46af6f3)
- chatqna k8s manifest: Fixed retriever-redis v0.9 image issue(7719755)
- Adding Trivy and SBOM actions(f3ffcd5)
- optimize CI log format(dfaf479)
GenAIComps
-
Cores
- Refine parameter in api_protocol.py(0584b45)
- Revert the default value of max_new_tokens to 1024(f2497c5)
- Fixed Orchestrator schedule method(76877c1)
- fix wrong indent(9b0edf2)
- Allow downstream of streaming nodes(90e367e)
- Add Retrieval gateway in core to support IndexRetrivel Megaservice(56daf95)
- add telemetry doc(2a2a93)
-
LLM/embedding/reranking/retrieval
- Using habana docker 1.16.1 everywhere(5deb383)
- adding entrypoint.sh to faq-generation comp (4a7b8f4)
- Fix image in docker compose yaml to use the built docker image tag from the README(72a2553)
- Refine LLM Native Microservice(b16b14a)
- Fix Retriever qdrant issue(7aee7e4)
- Change /root/ to /home/user/.(4a67d42)
- Fix embeddings_langchain-mosec issue.(87905ad)
- fix HuggingFaceEmbedding deprecated in favor of HuggingFaceInferenceAPIEmbedding(2891cc6)
- align vllm-ray response format to tgi response format(ac4a777)
- build new images for llms(ed99d47)
- LLM micro service input data does not have input model name(761f7e0)
- Fix OpenVINO vLLM build scripts and update unit test case(91d825c)
- Refine the instructions to run the retriever example with qdrant(eb51018)
- Add cmds to restart ollama service and add proxy settings while launching docker(8eb8b6a)
- Vllm and vllm-ray bug fix (add opea for vllm, update setuptools version)(0614fc2)
- remove deprecated langchain imports and switch to langchain-huggingface(055404a)
- [Enhence] Increase mosec_embedding forward timeout to support high concurrency cases(b61f61b)
- Fix issues in updating embedding & reranking model to bge-large-zh-v1.5(da19c5d)
- refact embedding/ranking/llm request/response by referring to openai format(7287caa)
- align VLLM micro-service output format with UI(c1887ed)
- fix vllm docker command(c1a5883)
- Update Embedding Mosec Dockerfile to use BAAI/bge-large-zh-v1.5(bbdc1f0)
- remove length limitation of embedding(edcd1e8)
- Support SearchedDoc input type in LLM for No Rerank Pipeline (3c29fb4)
- Add local_embedding return 768 length to align with chatqna example(a234db)
- Refine LLM for No Rerank(fe8ef3)
- Remove redundant dependency from 'vllm-ray' comps(068527d)
-
LVM/TTS/ASR
- Revise TTS, SpeechT5Model to end the last audio chunk at the correct punctuation mark location(20fc8ca)
- Support llava-next using TGI(e156101)
- whisper: Fix container build failure(d5b8cdf)
- support whisper long-form generation (daec680)
- Support multiple image sources for LVM microservice(ed776ac)
- fix ffmpeg build on hpu(ac3909d)
- Support streaming output for LVM microservice(c5a0344)
- Add video-llama LVM microservice under lvms(db8c893)
- add torchvision into requirements(1566047)
- Use Gaudi base images from Dockerhub(33db504)
- update the requirements.txt for tts and asr(5ba2561)
-
DataPrep
- Fix Dataprep qdrant issues and add Test Script(a851abf)
- Refine robustness of Dataprep Redis(04986c1)
- Address testcase failure(075e84f)
- Added support for Unified Port, GET/DELETE endpoints in pgvector Dataprep(8a62bac)
- Update dataprep default mosec embedding model in config.py(8f0f2b0)
- unify port in one microservice.(f8d45e5)
- Pinecone update to OPEA(7c9f77b)
- Refine Dataprep Code & UT(867e9d7)
- Support delete for Milvus vector db in Dataprep(767a14c)
- Redis-dataprep: Make Redis connection consistent(cfaf5f0)
- Update Dataprep with Parameter Settings(55b457b)
- Fix Dataprep Potential Error in get_file(04ff8bf)
- Add dependency for pdf2image and OCR processing(9397522)
- Fix the data load issue for structured files (40f1463)
- Fix deps #568(c541d1d)
-
Other Components
- Remove 'langsmith' per code review(dcf68a0)
- Refine Nginx Component(69f9895)
- Add logging for unified debug(fab1fbd)
- Add Nginx Component for Service Forwarding(60cc0b0)
- Fix line endings to LF(fecf4ac)
- Add Assistant API for agent(f3a8935)
- doc: remove use of unknown highlight language(5bd8bda)
- Update README.md(b271739)
- doc: fix multiple H1 headings(77e0e7b)
- Add RagAgentDocGrader to agent comp(368c833)
- Update Milvus docker-compose.yaml(d3eefea)
- prompt_registry: Unifying API endpoint port(27a01ee)
- Minor SPDX header update(4712545)
- Modification to toxicity plugin PR (63650d0)
- Optional container build instructions(be4833f)
- Add Uvicorn dependency(b2e2b1a)
- Support launch as Non-Root user in all published container images.(1eaf6b7)
- Update readme and remove empty readme(a61e434)
- Refine Guardrails README and update model(7749ce3)
- Add codeowner(fb0ea3d)
- Remove unnecessary langsmith dependency(cc8cd70)
- doc: add .gitignore(d39fee9)
- Add output evaluation for guardrails(62ca5bc)
- Add ML detection strategy to PII detection guardrail(de27e6b)
- Add finetuning list job, cancel job, retrieve finetuning job feature(7bbbdaf)
- update finetuning api with openai format.(1ff81da)
- Add finetuning component (ad0bb7c)
- Add toxicity detection microservice(97fdf54)
- fix searchqna readme(66cbbf3)
- Fix typos and add definitions for toxicity detection microservice(9b8798a)
-
CI/CD/UT
- Fix tts image build error(8b9dcdd)
- Add CD workflow.(5dedd04)
- Fix CI test changed file detect issue(cd83854)
- add sudo in wf remove(1043336)
- adapt GenAIExample test structure refine(7ffaf24)
- Freeze base images(61dba72)
- Fix image build check waring.(2b14c63)
- Modify validate result check.(8a6079d)
- Fix requirement actions(2207503)
- Add validate result detection.(cf15b91)
- Check build fail and change port 8008 to 5025/5026.(5159aac)
- Freeze requirements(5d9a855)
- Fix vllm-ray issue(0bd8215)
- Standardize image build.(a56a847)
- clean local images before test(f36629a)
- update test files(ab8ebc4)
- Fix validation failure without exit.(f46f1f3)
- Update Microservice CI trigger path(3ffcff4)
- Add E2E example test(ec4143e)
- Added unified ports for Chat History Microservice.(2098b91)
- add secrets for test(cafcf1b)
- [tests] normalize embedding and reranking endpoint docker image name(e3f29c3)
- fix asr ut on hpu(9580298)
- update image build list(7185d6b)
- Add path check for dockerfiles in compose.yaml and change workflow name.(c45f8f0)
- enhance docker image build(75d6bc9)
- refactor build image with common action.yml(ee5b0f6)
- Fix '=' miss issues.(eb5cc8a)
- fix freeze workflow(945b9e4)
GenAIEvals
- remove useless code.(1004d5b)
- Unify benchmark tool based on stresscli library(71637c0)
- Fixed query list id out-of-range issue(7b719de)
- Add GMC chatqna benchmark script(6a390da)
- Add test example prompts for codegen(ebee50c)
- doc: fix language on codeblock in README(85aef83)
- Fix metrics issue of CRUD(82c1654)
- Add benchmark stresscli scripts(9998cd7)
- remove useless code(1004d5b)
- Add GMC chatqna benchmark script(6a390da)
- Fixed query list id out-of-range issue(7b719de)
- enhance multihop dataset accuracy(dfc2c1e)
- doc: add Kubernetes platform-optimization README(7600db4)
- doc: fix platform optimization README based on PR#73 feedback(8c7eb1b)
- update for faq benchmark(d754a84)
- Support e2e and first token P90 statistics(b07cd12)
GenAIInfra
-
GMC
- update GMC e2e and Doc(8a85364)
- Fixed some bugs for GMC yaml files(112295a)
- Set up CD workflow for GMC(3d94844)
- GMC: Add GPU support for GMC.(119941e)
- authN-authZ: add oauth2-proxy support for authentication and authorization together with GMC(488a1ca)
- Output streaming support for the whole pipeline in GMC router(c412aa3)
- re-org k8s manifests files for GMC and examples(d39b315)
- GMC: resource management(81060ab)
- Enable GMC helm installation test in CI(497ff61)
- Add helm chart for deploying GMC itself(a76c90f)
- Add multiple endpoints for GMC pipeline via gmcrouter(da4f091)
- GMC: fix unsafe quoting(aa2730a)
- fix: update doc for authN-authZ with oauth(54cd66f)
- Troubleshooting guide for the validating webhook.(b47ec0c)
- Fix router bugs on max_new_tokens and dataprep gaudi yaml file(5735dd3)
- Add dataprep microservice to chatQnA example(d9a0271)
- Troubleshooting guide for the validating webhook(b47ec0c)
- Add HPA support to ChatQnA(cab7a88)
-
HelmChart
- Add manual helm e2e test flow(3b5f62e)
- Add script to generate manifests from helm charts(273cb1d)
- ui: update chatqna helm chart readme and env name(a1d6d70)
- Update helm chart readme(656dcc6)
- helm: fix tei/tgi/docsum(a270726)
- helm: update data-prep to latest changes(625899b)
- helm: Update helm manifest to address user raised issues(4319660)
- helm: Support local embedding(73b5b65)
- ui: add helm chart/manifests for conversational UI(9dbe550)
- helm: Add K8S probes to retriever-usvc(af47b3c)
- Enable google secrets in helm chart e2e workflow(7079049)
- Helm/Manifest: Add K8S probe(d3fc939)
- Enable helm/common tests in CI(fa8ef35)
- Helm: Add Nvidia GPU support for ChatQnA(868103b)
- misc changes(b1182c4)
- tgi: Update tgi version on xeon to latest-intel-cpu(c06bcea)
- Fix typos in README(faa976b)
- Support HF_ENDPOINT(cf28da4)
- Set model-volume default to tmp volume(b5c14cd)
- Enable using PV as model cache directory(c0d2ba6)
- add manual helm e2e test flow(3b5f62e)
- helm/manifest: Update to release v0.9(182183e)
-
Others
- Rename workflows to get better readable(cb31d05)
- Add manual job to freeze image tags and versions after code freeze(c0f5e2f)
- tgi: revert xeon version to 2.2.0(076e81e)
- Initial commit for Intel Gaudi Base Operator(c2a13d1)
- Add AudioQnA example and e2e test(1b50b73)
- Reorg and rename CI workflows to follow the rules(2bf648c)
- Fix errors in ci workflow(779e526)
- Add e2e test for chatqna with switch mode enable(7b20273)
- Validating webhook implementation(df5f6f3)
- Enhance manually run image build workflow(e983c32)
- Add image build process on manual event(833dcec)
- CI: change chart e2e to support tag replacing(739788a)
- Add e2e test for chatQnA with dataprep microservice(c1fd27f)
- Fix a bug of chart e2e workflow(86dd739)
- Improve chart e2e test workflow and scripts(70205e5)
- rename workflows to get better readable(cb31d05)
- Correct TGI image tag for NV platform(629033b)
- authN-authZ: change folder and split support(0c39b7b)
- fix errors of manual helm workflow(bd46dfd)
- update freeze tag manual workflow(c565909)
- Update README(9480afc)
- improve cd workflows and add release document (a4398b0)
- Add some NVIDIA platform support docs and scripts(cad2fc3)