-
Notifications
You must be signed in to change notification settings - Fork 87
Open
Description
Problem & Motivation
- Customers have observed significant speedups when they need to generate based on multiple prompts using batched generations.
- Currently fir/irr state are maintained without batch index, so to get batching we would need to introduce batch index in inference_context.fir_state etc in the inference kernels.
BioNeMo Framework Version
Category
Inference
Proposed Solution
- Add batch index to fir/irr state are maintained without batch index, so to get batching we would need to introduce batch index in inference_context.fir_state etc in NeMO.
- Add test coverage for batched inference.
Expected Benefits
- Significant (10x+) performance gains for many shorter generations.
Code Example
cclough
Metadata
Metadata
Assignees
Labels
No labels