feat: enable `ModelLoaderHuggerFace` to support loading models in fp16 for inference #555

0x404 · 2024-09-22T09:09:34Z

ModelLoaderHuggerFace currently only supports reading tensors from a checkpoint and loading them into the model, while keeping the tensor dtype as it is.

This PR adds an fp16_inference option, allowing ModelLoaderHuggerFace to load fp16 models for fp16 inference.

when `fp16_inference` is enabled, the model will be loaded as fp16 paramters when inference.

This reverts commit 49fc21e.

ShawnXuan · 2024-09-23T05:52:34Z

加载模型后再转为fp16，内存会突然减小很多。

0x404 and others added 6 commits September 22, 2024 08:38

feat: add fp16_inference option to support fp16 infer

2198f6f

when `fp16_inference` is enabled, the model will be loaded as fp16 paramters when inference.

solve conflicts

4a58ede

update

49fc21e

support chatglm with fp16 inference

4edb33a

Revert "update"

ec1a81a

This reverts commit 49fc21e.

set defaults to False

c2a8ef6

0x404 requested review from fpzh2011, ShawnXuan and Flowingsun007 September 23, 2024 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable `ModelLoaderHuggerFace` to support loading models in fp16 for inference #555

feat: enable `ModelLoaderHuggerFace` to support loading models in fp16 for inference #555

0x404 commented Sep 22, 2024

ShawnXuan commented Sep 23, 2024

feat: enable ModelLoaderHuggerFace to support loading models in fp16 for inference #555

Are you sure you want to change the base?

feat: enable ModelLoaderHuggerFace to support loading models in fp16 for inference #555

Conversation

0x404 commented Sep 22, 2024

ShawnXuan commented Sep 23, 2024

feat: enable `ModelLoaderHuggerFace` to support loading models in fp16 for inference #555

feat: enable `ModelLoaderHuggerFace` to support loading models in fp16 for inference #555