Llama-proxy-server serves as a proxy server for LLM APIs.
Note
The project is still under active development. The existing features still need to be improved and more features will be added in the future.
-
Download llama-proxy-server
curl -LO https://github.com/LlamaEdge/llama-proxy-server/releases/latest/download/llama-proxy-server.wasm
-
Start llama-proxy-server
wasmedge llama-proxy-server.wasm --port 10086
llama-proxy-server
will use8080
port by default. You can change the port by adding--port <port>
.
-
Register a downstream server
curl -X POST http://localhost:8080/admin/register/{type} -d "http://localhost:8080"
The
{type}
can bechat
,whisper
,image
.For example, register a whisper server:
curl -X POST http://localhost:8080/admin/register/whisper -d "http://localhost:12306"
-
Unregister a downstream server
curl -X POST http://localhost:8080/admin/unregister/{type} -d "http://localhost:8080"
The
{type}
can bechat
,whisper
,image
.For example, unregister a whisper server:
curl -X POST http://localhost:8080/admin/unregister/whisper -d "http://localhost:12306"
-
List available downstream servers
To list all the registered downstream servers and their types, you can use the following command:
curl -X POST http://localhost:8080/admin/servers
If the downstream servers are registered, the response will be like:
{ "chat": [], "image": [], "whisper": [ "http://0.0.0.0:12306/" ] }
Currently, llama-proxy-server
supports the following three types of business endpoints:
chat
endpoints (corresponds tollama-api-server
)/v1/chat/completions
/v1/completions
/v1/models
/v1/embeddings
/v1/files
/v1/chunks
whisper
endpoints (corresponds towhisper-api-server
)/v1/audio/transcriptions
/v1/audio/translations
image
endpoints (corresponds tosd-api-server
)/v1/images/generations
/v1/images/edits