Skip to content

[Feature Request] Reimplement Load Model of Triton and MLServer #53

Open
@WaterKnight1998

Description

@WaterKnight1998

Good afternoon,

Thank you very much for creating this amazing framework.

I have seen a potential very good feature when doing inference with GPU models. I have seen that the implementation of triton and mlserver adapters use the following method: CalcMemCapacity to return model size.

This method returns model size based on disk size. However, for models executed in GPU it would be better to return the increase in VRAM. Do you think is doable? @tjohnson31415 @rafvasq @njhill @pvaneck

I am glad to help if you think is doable, but I don't have experience in GO, but I can learn

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions