Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dynamic batching embedding/reranking #774

Merged
merged 43 commits into from
Nov 6, 2024

Conversation

Spycsh
Copy link
Member

@Spycsh Spycsh commented Oct 9, 2024

Description

To solve the issue that multiple "small" GenAI services cannot be launched on one Gaudi card, here we draft one implementation of replacing TEI logics with local inference using optimum habana + dynamic batching. This requires the following functionalities:

  • request buffering
  • HPU batch inference on embedding/reranking

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

Under implementation

Tests

Under implementation

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 68.29268% with 13 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comps/cores/mega/micro_service.py 68.29% 13 Missing ⚠️
Files with missing lines Coverage Δ
comps/cores/mega/micro_service.py 82.90% <68.29%> (-8.24%) ⬇️

@Spycsh Spycsh marked this pull request as ready for review October 11, 2024 02:29
@Spycsh Spycsh marked this pull request as draft October 11, 2024 07:28
@Spycsh Spycsh changed the title add static batching embedding/reranking add dynamic batching embedding/reranking Oct 11, 2024
@Spycsh
Copy link
Member Author

Spycsh commented Oct 23, 2024

@lvliang-intel lvliang-intel merged commit 518cdfb into opea-project:main Nov 6, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants