Skip to content

Model Initialisation happening per request - JAVA GPU inferencing #11668

@suryakant261

Description

@suryakant261

We are observing high latency when running real-time batch inference on GPUs using the xgboost4j-gpu Java library. Our investigation suggests that a model initialization step may be occurring on every predict call, even when Booster objects are reused.
Please see the attached flame graph, which highlights the time spent in the model initialization phase during a predict call.

Image

Code reference we are using for prediction.

try {
            DMatrix dmat = new DMatrix(features, nRows, nCols, Float.NaN);
            booster = boosters.borrowObject();  // taking a pre initialised Booster object from a pool
            Timer.Context context = MetricUtils.addTimer("xgb_true_time", this.getClass()).time();

            float[][] modelScores = booster.predict(dmat);
            context.stop();

        } catch (XGBoostError e) {
            e.printStackTrace();
        } finally {
            if (booster != null ) {
                boosters.returnObject(booster); // returning back to pool
            }

Question
Is this internal model initialization for each batch prediction an expected behavior for the GPU predictor, or could this indicate a potential bug or a misconfiguration on our part? Any guidance on how to avoid this per-batch setup cost would be greatly appreciated.

(I was referring to this section, and from the source code, I also understood that Init is called for every predict call , https://github.com/dmlc/xgboost/blob/release_2.0.0/src/predictor/gpu_predictor.cu#L890)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions