server : (refactoring) do not rely on JSON internally #10643

ngxson · 2024-12-03T22:50:04Z

Motivation

Currently, the internal code of server.cpp depends too much on json type. To a point that we're kinda abusing JSON to circumvent doing proper struct in the code.

Here is now the server process input/output data currently:

Receive JSON input from HTTP thread
HTTP thread do some post-formatting (i.e. convert OAI format to "internal" format), then pass the formatted JSON to task queue
Inference thread pick up the task, run it through launch_slot_with_task to put correct data into slot
Slot decode, then give back a JSON as result
JSON passed to HTTP layer, being (optionally) formatted again, then send out to client

graph TD;
    user-- JSON -->http[http thread];
    http-- JSON -->launch_slot_with_task;
    launch_slot_with_task-- struct -->update_slots;
    update_slots-- JSON -->http;
    http-- JSON -->user;

Proposal

In this PR, I propose that we only handle JSON in HTTP thread:

Before putting it to task queue, we should already converted the JSON into a struct
Inference thread should never call anything related to json
Results from inference thread are always struct
Before sending out the response to client, HTTP thread is responsible for converting struct to JSON

graph TD;
    user-- JSON -->http[http thread];
    http-- struct -->inf[inference thread];
    inf-- struct -->http;
    http-- JSON -->user;

Changes to the API

/slots and /completions:

stopped_eos, stopped_word, stopped_limit are replaced by an enum string stop_type

/chat/completions:

Fixed finish_reason returning incorrect value. If generation is stopped due to stop word or EOS, finish_reason="stop". Otherwise, finish_reason="length"

TODO:

fix log probs result
add test result stream vs non-stream
make sure model alias is still correct
update docs

ngxson · 2024-12-04T19:15:14Z

@ggerganov Because this refactoring change quite a lot of code, so I think it would be better to split it into 2 parts:

This PR: structs for sending result from inference thread --> http
Next PR:structs for sending result from http --> inference thread

So the current inference thread --> http part in this PR is ready to be reviewed. Could you please have a look when you have time?

I'm also tagging @slaren in case you want to leave some suggestions for my approach of polymorphism in this PR. Thank you!

slaren · 2024-12-05T00:04:34Z

I don't have a good overall view of how the server is implemented and what this is doing, but there are several red flags that don't look right to me.

server_task being a base class without any virtual members for the derived classes to implement
Having both the derived classes and an enum result_type
The unsafe casts in in the from_ptr ptr that most of these classes implement. On a side note, do not pass unique_ptr by reference unless you plan to change or take ownership of the pointer, pass a pointer instead.
The send function that is implemented as a template instead of as a function that takes a ptr to the base type

Again, I do not have a good overall view of the server implementation to make specific recommendations, but that's not what I would expect from a class hierarchy. Generally, you should look into abstracting the interface into a few functions, and implement these in the derived classes. Casts from the base class to the derived class should never be necessary.

ggerganov · 2024-12-05T09:05:12Z

The goal to limit the use of JSON object in the server implementation is good, but the proposed implementation has some deficiencies. @slaren highlighted most of the issues.

IMO the server-result polymorphism is not warranted in this case and introduces unnecessary complexity. I would recommend to have a single struct server_task_result with all members from all results merged into it with proper naming (prefixes) or nested structs. The result type should be carried only by result_type type (btw, rename enum result_type to enum server_result_type). The server_task_result::to_json() method should use a switch (type) and construct the json based on that.

ngxson · 2024-12-05T09:34:17Z

Honestly I'm pretty new to cpp polymorphism and thanks to the points that @slaren highlighted, I understand it more clearly now.

IMO the server-result polymorphism is not warranted in this case and introduces unnecessary complexity. I would recommend to have a single struct server_task_result with all members from all results merged into it with proper naming (prefixes) or nested structs.

I think having prefixed may be worse to manage than the current JSON approach. Having nested struct can be cleaner, but I think it's kinda polymorphism, which better to do with proper cpp virtual function that @slaren pointed out.

Anw, I'll try to implement virtual and dynamic_cast for a more clean implementation. Let's see how it goes.

ggerganov · 2024-12-05T09:45:15Z

Anw, I'll try to implement virtual and dynamic_cast for a more clean implementation. Let's see how it goes.

Keep in mind that if you end up needing dynamic_cast then something is not OK in the implementation. There should be no need to cast from base class to derived class in this case.

ngxson · 2024-12-05T13:47:43Z

So I've been able to refactor all JSON-related function into virtual to_json(), which eliminates the need of enum result_type and unsafe type casting.

I do still use dynamic_cast at some places for 3 reasons:

GGML_ASSERT to check if the result is the expected derived class. This is not very important, can be removed if we don't want
To check if a given server_task_result is an error response or not
To read members of server_task_result_metrics --> this is because I haven't refactored metrics-related functions, but we can get rid of the dynamic_cast after they are refactored

examples/server/server.cpp

ngxson · 2024-12-05T22:15:43Z

examples/server/tests/README.md

+Hint: You can compile and run test in single command, useful for local developement:
+
+```shell
+cmake --build build -j --target llama-server && ./examples/server/tests/tests.sh


@ggerganov FYI, the change in tests.sh should allow you to run test script from anywhere, not necessary need to cd tests

ngxson · 2024-12-05T22:33:44Z

examples/server/server.cpp

+    }
+
+    json to_json_oai_compat() {
+        std::string finish_reason = "length";


@Nero7991 I'm gonna merge this PR soon. You can adapt your PR #10645 to take advantage of this to_json_oaicompat(). Please note that for /completion endpoint, oaicompat_chat will be false.

Since this is a private function, you can even refactor this function into to_json_oaicompat and to_json_oaicompat_chat for these 2 different cases, then have an if..else branch in to_json to select the correct one

* server : (refactoring) reduce usage of json internally * move all response types to struct * wip [no ci] * many fixes * add virtual function * fix index * minor style fix * add std::move * refactor handle_completions_generic * add virtual functions * remove server.hpp * clarify server_sent_event RFC specs * apply review comments * fix model_alias and completion_probabilities * small clean up * remove virtual for to_json_oai_compat() * naming oai_compat --> oaicompat * fix unwanted recursive call * update docs

kaetemi · 2024-12-07T09:29:30Z

n_ctx on /props is always returning 0 here since this commit

ngxson · 2024-12-07T10:31:16Z

hmm ok seems like /slots and /props are currently broken (missing tests for them, I was too confident!)

will fix that as soon as I get home

* server : (refactoring) reduce usage of json internally * move all response types to struct * wip [no ci] * many fixes * add virtual function * fix index * minor style fix * add std::move * refactor handle_completions_generic * add virtual functions * remove server.hpp * clarify server_sent_event RFC specs * apply review comments * fix model_alias and completion_probabilities * small clean up * remove virtual for to_json_oai_compat() * naming oai_compat --> oaicompat * fix unwanted recursive call * update docs

server : (refactoring) reduce usage of json internally

b7d38ee

github-actions bot added examples server labels Dec 3, 2024

ngxson mentioned this pull request Dec 3, 2024

server: add OpenAI compatible response format for /completions #10627

Closed

ngxson changed the title ~~server : (refactoring) reduce usage of json internally~~ server : (refactoring) do not rely on JSON internally Dec 3, 2024

Nero7991 mentioned this pull request Dec 4, 2024

server: add OpenAI compatible response format for legacy /completions with b… #10645

Open

ngxson added 3 commits December 4, 2024 14:16

move all response types to struct

1011a51

wip [no ci]

0d6485f

many fixes

d2419b3

github-actions bot added the python python script changes label Dec 4, 2024

ngxson added 4 commits December 4, 2024 19:18

add virtual function

ea1be7f

fix index

3b41ad5

minor style fix

1261086

add std::move

eaa1288

ngxson marked this pull request as ready for review December 4, 2024 19:10

ngxson requested review from ggerganov and slaren December 4, 2024 19:15

refactor handle_completions_generic

cb66671

add virtual functions

8ab173c

ngxson added 2 commits December 5, 2024 16:04

remove server.hpp

1cf769b

clarify server_sent_event RFC specs

2e560f9

ggerganov approved these changes Dec 5, 2024

View reviewed changes

examples/server/server.cpp Outdated Show resolved Hide resolved

examples/server/server.cpp Outdated Show resolved Hide resolved

ngxson added 2 commits December 5, 2024 22:35

apply review comments

a43e1dc

fix model_alias and completion_probabilities

fb4b9be

ngxson commented Dec 5, 2024

View reviewed changes

ngxson added 2 commits December 5, 2024 23:16

small clean up

4c3d258

remove virtual for to_json_oai_compat()

ffc4441

ngxson commented Dec 5, 2024

View reviewed changes

ngxson added 2 commits December 5, 2024 23:34

naming oai_compat --> oaicompat

db66153

fix unwanted recursive call

dfa59b9

ngxson added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Dec 6, 2024

update docs

25be4cc

This was referenced Dec 6, 2024

changelog : libllama API #9289

Open

server: Fix the status of finish_reason if the stream value is False #10382

Closed

ngxson merged commit 6c5bc06 into ggerganov:master Dec 6, 2024
46 checks passed

ngxson mentioned this pull request Dec 6, 2024

server : (refactor) no more json in server_task input #10691

Merged

m18coppola mentioned this pull request Dec 6, 2024

server : bugfix - stop server from sending empty json during oai chat completions #10694

Closed

ggerganov mentioned this pull request Dec 7, 2024

server : various fixes #10704

Merged

abc-nix mentioned this pull request Dec 7, 2024

Misc. bug: server - GET /props model value no longer works after commit 6c5bc06 #10705

Closed

JeroenAdam mentioned this pull request Dec 20, 2024

OpenAI compatible response for /models has empty id and empty name #10924

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : (refactoring) do not rely on JSON internally #10643

server : (refactoring) do not rely on JSON internally #10643

ngxson commented Dec 3, 2024 •

edited

Loading

ngxson commented Dec 4, 2024 •

edited

Loading

slaren commented Dec 5, 2024

ggerganov commented Dec 5, 2024

ngxson commented Dec 5, 2024

ggerganov commented Dec 5, 2024

ngxson commented Dec 5, 2024 •

edited

Loading

ngxson Dec 5, 2024

ngxson Dec 5, 2024 •

edited

Loading

kaetemi commented Dec 7, 2024

ngxson commented Dec 7, 2024

server : (refactoring) do not rely on JSON internally #10643

server : (refactoring) do not rely on JSON internally #10643

Conversation

ngxson commented Dec 3, 2024 • edited Loading

Motivation

Proposal

Changes to the API

ngxson commented Dec 4, 2024 • edited Loading

slaren commented Dec 5, 2024

ggerganov commented Dec 5, 2024

ngxson commented Dec 5, 2024

ggerganov commented Dec 5, 2024

ngxson commented Dec 5, 2024 • edited Loading

ngxson Dec 5, 2024

Choose a reason for hiding this comment

ngxson Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

kaetemi commented Dec 7, 2024

ngxson commented Dec 7, 2024

ngxson commented Dec 3, 2024 •

edited

Loading

ngxson commented Dec 4, 2024 •

edited

Loading

ngxson commented Dec 5, 2024 •

edited

Loading

ngxson Dec 5, 2024 •

edited

Loading