Question about batch request optimization in BentoML #5098

oogou11 · 2024-11-22T06:16:08Z

oogou11
Nov 22, 2024

I'm currently working with BentoML's batchable API and have some questions regarding the optimization parameters used during batch request processing.

I'm using the following decorator for batchable API:

@bentoml.api(batchable=True, max_batch_size=32, max_latency_ms=1000)

I noticed the following parameters involved in the implementation:

N_KEPT_SAMPLE = 50
N_SKIPPED_SAMPLE = 2
INTERVAL_REFRESH_PARAMS = 5

Additionally, in the linear regression (least squares) optimization process, I saw that initial values for parameters are set as follows:

self.o_a = min(2, max_latency * 2.0 / 30)
self.o_b = min(1, max_latency * 1.0 / 30)

Lastly, the TokenBucket algorithm is used to control refresh intervals:

self._refresh_tb = TokenBucket(2)

I would like to understand how the values for these parameters (such as 50, 2, 5, 2, 2.0, 30, etc.) are derived and how they relate to the batch processing mechanism. What is their role in optimizing performance, and why are these specific values chosen?

Any insights would be greatly appreciated. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

Question about batch request optimization in BentoML #5098

{{title}}

Replies: 0 comments

Select a reply

BentoML

Question about batch request optimization in BentoML #5098

oogou11 Nov 22, 2024

Replies: 0 comments

oogou11
Nov 22, 2024