Skip to content
This repository was archived by the owner on Aug 7, 2025. It is now read-only.

Commit a07b7d9

Browse files
Llama.cpp example for cpp backend (#2904)
* Version1 of llm inference with cpp backend Signed-off-by: Shrinath Suresh <[email protected]> Updating llm handler - loadmodel, preprocess, inference methods Signed-off-by: Shrinath Suresh <[email protected]> Fixed infinite lock by adding request ids to the preprocess method Signed-off-by: Shrinath Suresh <[email protected]> Adding test script for finding tokens per second llama-7b-chat and ggml version Signed-off-by: Shrinath Suresh <[email protected]> GGUF Compatibility Signed-off-by: Shrinath Suresh <[email protected]> Fixing unit tests Signed-off-by: Shrinath Suresh <[email protected]> Fix typo Signed-off-by: Shrinath Suresh <[email protected]> Using folly to read config path Signed-off-by: Shrinath Suresh <[email protected]> Removing debug couts Signed-off-by: Shrinath Suresh <[email protected]> Processing all the items in the batch Signed-off-by: Shrinath Suresh <[email protected]> Adopted llama.cpp api changes * Adapt to removal of TS backend * Re-add test for llama.cpp example * Add llama.cpp as a submodule * Point to correct llama.cpp installation * Build llama.cpp in build.sh * Skip llama.cpp example test if model weights are not available * renamed torchscript_model folder into examples * Adjust to new base_handler interface * Remove debug statement * Rename llamacpp class + remove dummy.pt file * Move llamacpp config.json * Moved and created prompt file * Reset context for mutiple batch entries * Add doc for llamacpp example * Fix spell check * Replace output example in llamacpp example * Move cpp example src into main examples folder * Convert cerr/cout into logs --------- Co-authored-by: Shrinath Suresh <[email protected]>
1 parent 3ecaf0b commit a07b7d9

File tree

40 files changed

+564
-67
lines changed

40 files changed

+564
-67
lines changed

.gitmodules

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
[submodule "third_party/google/rpc"]
22
path = third_party/google/rpc
33
url = https://github.com/googleapis/googleapis.git
4+
[submodule "cpp/third-party/llama.cpp"]
5+
path = cpp/third-party/llama.cpp
6+
url = https://github.com/ggerganov/llama.cpp.git

cpp/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,23 +49,23 @@ By default, TorchServe cpp provides a handler for TorchScript [src/backends/hand
4949
```
5050
torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler TorchScriptHandler --runtime LSP
5151
```
52-
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/base_handler) of unzipped model mar file.
52+
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/examples/mnist/base_handler) of unzipped model mar file.
5353
##### Using Custom Handler
5454
* build customized handler shared lib. For example [Mnist handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/examples/image_classifier/mnist).
5555
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
5656
* set handler as "libmnist_handler:MnistHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
5757
```
5858
torch-model-archiver --model-name mnist_handler --version 1.0 --serialized-file mnist_script.pt --handler libmnist_handler:MnistHandler --runtime LSP
5959
```
60-
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/mnist_handler) of unzipped model mar file.
60+
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/examples/mnist/mnist_handler) of unzipped model mar file.
6161
##### BabyLLama Example
6262
The babyllama example can be found [here](https://github.com/pytorch/serve/blob/master/cpp/src/examples/babyllama/).
6363
To run the example we need to download the weights as well as tokenizer files:
6464
```bash
6565
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
6666
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
6767
```
68-
Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json).
68+
Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/examples/babyllama/babyllama_handler/config.json).
6969
```bash
7070
{
7171
"checkpoint_path" : "/home/ubuntu/serve/cpp/stories15M.bin",
@@ -74,7 +74,7 @@ Subsequently, we need to adjust the paths according to our local file structure
7474
```
7575
Then we can create the mar file and deploy it with:
7676
```bash
77-
cd serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler
77+
cd serve/cpp/test/resources/examples/babyllama/babyllama_handler
7878
torch-model-archiver --model-name llm --version 1.0 --handler libbabyllama_handler:BabyLlamaHandler --runtime LSP --extra-files config.json
7979
mkdir model_store && mv llm.mar model_store/
8080
torchserve --ncs --start --model-store model_store
@@ -85,7 +85,7 @@ The handler name `libbabyllama_handler:BabyLlamaHandler` consists of our shared
8585

8686
To test the model we can run:
8787
```bash
88-
cd serve/cpp/test/resources/torchscript_model/babyllama/
88+
cd serve/cpp/test/resources/examples/babyllama/
8989
curl http://localhost:8080/predictions/llm -T prompt.txt
9090
```
9191
##### Mnist example

cpp/build.sh

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,14 @@ function install_yaml_cpp() {
136136
cd "$BWD" || exit
137137
}
138138

139+
function build_llama_cpp() {
140+
BWD=$(pwd)
141+
LLAMA_CPP_SRC_DIR=$BASE_DIR/third-party/llama.cpp
142+
cd "${LLAMA_CPP_SRC_DIR}"
143+
make
144+
cd "$BWD" || exit
145+
}
146+
139147
function build() {
140148
MAYBE_BUILD_QUIC=""
141149
if [ "$WITH_QUIC" == true ] ; then
@@ -206,16 +214,6 @@ function build() {
206214
echo -e "${COLOR_GREEN}torchserve_cpp build is complete. To run unit test: \
207215
./_build/test/torchserve_cpp_test ${COLOR_OFF}"
208216

209-
if [ -f "$DEPS_DIR/../src/examples/libmnist_handler.dylib" ]; then
210-
mv $DEPS_DIR/../src/examples/libmnist_handler.dylib $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.dylib
211-
elif [ -f "$DEPS_DIR/../src/examples/libmnist_handler.so" ]; then
212-
mv $DEPS_DIR/../src/examples/libmnist_handler.so $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.so
213-
fi
214-
215-
if [ -f "$DEPS_DIR/../src/examples/libbabyllama_handler.so" ]; then
216-
mv $DEPS_DIR/../src/examples/libbabyllama_handler.so $DEPS_DIR/../../test/resources/torchscript_model/babyllama/babyllama_handler/libbabyllama_handler.so
217-
fi
218-
219217
cd $DEPS_DIR/../..
220218
if [ -f "$DEPS_DIR/../test/torchserve_cpp_test" ]; then
221219
$DEPS_DIR/../test/torchserve_cpp_test
@@ -311,10 +309,13 @@ mkdir -p "$LIBS_DIR"
311309
# Must execute from the directory containing this script
312310
cd $BASE_DIR
313311

312+
git submodule update --init --recursive
313+
314314
install_folly
315315
install_kineto
316316
install_libtorch
317317
install_yaml_cpp
318+
build_llama_cpp
318319
build
319320
symlink_torch_libs
320321
symlink_yaml_cpp_lib

cpp/src/examples/CMakeLists.txt

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,6 @@
1-
set(MNIST_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/examples/image_classifier/mnist")
21

3-
set(MNIST_SOURCE_FILES "")
4-
list(APPEND MNIST_SOURCE_FILES ${MNIST_SRC_DIR}/mnist_handler.cc)
5-
add_library(mnist_handler SHARED ${MNIST_SOURCE_FILES})
6-
target_include_directories(mnist_handler PUBLIC ${MNIST_SRC_DIR})
7-
target_link_libraries(mnist_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})
2+
add_subdirectory("../../../examples/cpp/babyllama/" "../../../test/resources/examples/babyllama/babyllama_handler/")
83

4+
add_subdirectory("../../../examples/cpp/llamacpp/" "../../../test/resources/examples/llamacpp/llamacpp_handler/")
95

10-
set(BABYLLAMA_SRC_DIR "${torchserve_cpp_SOURCE_DIR}/src/examples/babyllama")
11-
set(BABYLLAMA_SOURCE_FILES "")
12-
list(APPEND BABYLLAMA_SOURCE_FILES ${BABYLLAMA_SRC_DIR}/baby_llama_handler.cc)
13-
add_library(babyllama_handler SHARED ${BABYLLAMA_SOURCE_FILES})
14-
target_include_directories(babyllama_handler PUBLIC ${BABYLLAMA_SRC_DIR})
15-
target_link_libraries(babyllama_handler PRIVATE ts_backends_core ts_utils ${TORCH_LIBRARIES})
16-
target_compile_options(babyllama_handler PRIVATE -Wall -Wextra -Ofast)
6+
add_subdirectory("../../../examples/cpp/mnist/" "../../../test/resources/examples/mnist/mnist_handler/")

cpp/test/backends/otf_protocol_and_handler_test.cc

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
2424
// model_name length
2525
.WillOnce(::testing::Return(5))
2626
// model_path length
27-
.WillOnce(::testing::Return(51))
27+
.WillOnce(::testing::Return(42))
2828
// batch_size
2929
.WillOnce(::testing::Return(1))
3030
// handler length
@@ -44,9 +44,8 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
4444
strncpy(data, "mnist", length);
4545
}))
4646
.WillOnce(testing::Invoke([=](size_t length, char* data) {
47-
ASSERT_EQ(length, 51);
48-
strncpy(data, "test/resources/torchscript_model/mnist/base_handler",
49-
length);
47+
ASSERT_EQ(length, 42);
48+
strncpy(data, "test/resources/examples/mnist/base_handler", length);
5049
}))
5150
.WillOnce(testing::Invoke([=](size_t length, char* data) {
5251
ASSERT_EQ(length, 11);
@@ -60,7 +59,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
6059
EXPECT_CALL(*client_socket, SendAll(testing::_, testing::_)).Times(1);
6160
auto load_model_request = OTFMessage::RetrieveLoadMsg(*client_socket);
6261
ASSERT_EQ(load_model_request->model_dir,
63-
"test/resources/torchscript_model/mnist/base_handler");
62+
"test/resources/examples/mnist/base_handler");
6463
ASSERT_EQ(load_model_request->model_name, "mnist");
6564
ASSERT_EQ(load_model_request->envelope, "");
6665
ASSERT_EQ(load_model_request->model_name, "mnist");
@@ -71,7 +70,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
7170
auto backend = std::make_shared<torchserve::Backend>();
7271
MetricsRegistry::Initialize("test/resources/metrics/default_config.yaml",
7372
MetricsContext::BACKEND);
74-
backend->Initialize("test/resources/torchscript_model/mnist/base_handler");
73+
backend->Initialize("test/resources/examples/mnist/base_handler");
7574

7675
// load the model
7776
auto load_model_response = backend->LoadModel(load_model_request);
@@ -126,7 +125,7 @@ TEST(BackendIntegTest, TestOTFProtocolAndHandler) {
126125
.WillOnce(testing::Invoke([=](size_t length, char* data) {
127126
ASSERT_EQ(length, 3883);
128127
// strncpy(data, "valu", length);
129-
std::ifstream input("test/resources/torchscript_model/mnist/0_png.pt",
128+
std::ifstream input("test/resources/examples/mnist/0_png.pt",
130129
std::ios::in | std::ios::binary);
131130
std::vector<char> image((std::istreambuf_iterator<char>(input)),
132131
(std::istreambuf_iterator<char>()));

cpp/test/examples/examples_test.cc

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,38 @@
1+
#include <fstream>
2+
13
#include "test/utils/common.hh"
24

35
TEST_F(ModelPredictTest, TestLoadPredictBabyLlamaHandler) {
6+
std::string base_dir = "test/resources/examples/babyllama/";
7+
std::string file1 = base_dir + "babyllama_handler/stories15M.bin";
8+
std::string file2 = base_dir + "babyllama_handler/tokenizer.bin";
9+
10+
std::ifstream f1(file1);
11+
std::ifstream f2(file2);
12+
13+
if (!f1.good() && !f2.good())
14+
GTEST_SKIP()
15+
<< "Skipping TestLoadPredictBabyLlamaHandler because of missing files: "
16+
<< file1 << " or " << file2;
17+
18+
this->LoadPredict(
19+
std::make_shared<torchserve::LoadModelRequest>(
20+
base_dir + "babyllama_handler", "llm", -1, "", "", 1, false),
21+
base_dir + "babyllama_handler", base_dir + "prompt.txt", "llm_ts", 200);
22+
}
23+
24+
TEST_F(ModelPredictTest, TestLoadPredictLlmHandler) {
25+
std::string base_dir = "test/resources/examples/llamacpp/";
26+
std::string file1 = base_dir + "llamacpp_handler/llama-2-7b-chat.Q5_0.gguf";
27+
std::ifstream f(file1);
28+
29+
if (!f.good())
30+
GTEST_SKIP()
31+
<< "Skipping TestLoadPredictLlmHandler because of missing file: "
32+
<< file1;
33+
434
this->LoadPredict(
535
std::make_shared<torchserve::LoadModelRequest>(
6-
"test/resources/torchscript_model/babyllama/babyllama_handler", "llm",
7-
-1, "", "", 1, false),
8-
"test/resources/torchscript_model/babyllama/babyllama_handler",
9-
"test/resources/torchscript_model/babyllama/prompt.txt", "llm_ts", 200);
36+
base_dir + "llamacpp_handler", "llamacpp", -1, "", "", 1, false),
37+
base_dir + "llamacpp_handler", base_dir + "prompt.txt", "llm_ts", 200);
1038
}

cpp/test/resources/torchscript_model/babyllama/babyllama_handler/MAR-INF/MANIFEST.json renamed to cpp/test/resources/examples/babyllama/babyllama_handler/MAR-INF/MANIFEST.json

File renamed without changes.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"checkpoint_path" : "test/resources/examples/babyllama/babyllama_handler/stories15M.bin",
3+
"tokenizer_path" : "test/resources/examples/babyllama/babyllama_handler/tokenizer.bin"
4+
}

cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.properties renamed to cpp/test/resources/examples/babyllama/babyllama_handler/config.properties

File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)