-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Local API Server
The GPT4All Chat Desktop Application comes with a built-in server mode allowing you to programmatically interact with any supported local LLM through a familiar HTTP API. Namely, the server implements a subset of the OpenAI API specification.
Note
The server exposes an API, not a Web GUI client. However, existing web clients which support this API should be able to use it.
You can enable the webserver via GPT4All Chat > Settings > Application > Enable Local Server. By default, it listens on port 4891 (the reverse of 1984).
The documentation has short descriptions of the settings.
The quickest way to ensure connections are allowed is to open the path /v1/models
in your browser, as it is a GET
endpoint. Try the first endpoint link below. If this doesn't work, you probably have to adjust your firewall settings.
Important
- It can only be accessed through the
http
protocol, neverhttps
. - The server only listens on the local machine, that is
localhost
or127.0.0.1
. - It does not listen on the IPv6 localhost address
::1
.
Four endpoints are currently implemented:
Method | Path | Example Link to Full URL (Default Port) |
---|---|---|
GET |
/v1/models |
http://localhost:4891/v1/models |
GET |
/v1/models/<name> |
http://localhost:4891/v1/models/Phi-3%20Mini%20Instruct |
POST |
/v1/completions |
http://localhost:4891/v1/completions |
POST |
/v1/chat/completions |
http://localhost:4891/v1/chat/completions |
The local API server has its own chat window. If you already have a number of saved conversations it may not be obvious, but it's the last entry in the chat history sidebar. Therefore you need to scroll all the way to the bottom.
It has a different background color so you know you're picking the right one.
You can activate LocalDocs from within the GUI. Follow these steps:
- Open the Chats view and open both sidebars.
- Scroll down to the bottom in the left sidebar (chat history); the last entry will be for the server itself. Activate that chat.
- Activate one or more LocalDocs collections in the right sidebar.
- Use a client and make server requests now.
Note
It is not possible to activate LocalDocs or select a collection from a client through the API.
For this example, use an old-style library, preferably in a virtual environment. That is, create a virtual environment and install it with pip install "openai ~= 0.28"
.
import openai
#openai.api_base = "https://api.openai.com/v1"
openai.api_base = "http://localhost:4891/v1"
openai.api_key = "not needed for a local LLM"
# Set up the prompt and other parameters for the API request
prompt = "Who is Michael Jordan?"
#model = "gpt-3.5-turbo"
model = "Phi-3 Mini Instruct"
# Make the API request
response = openai.Completion.create(
model=model,
prompt=prompt,
max_tokens=50,
temperature=0.28,
top_p=0.95,
n=1,
echo=True,
stream=False
)
# Print the generated completion
print(response)
cURL is a command-line tool for talking to servers through various protocols.
curl -X POST http://localhost:4891/v1/chat/completions -d '{
"model": "Phi-3 Mini Instruct",
"messages": [{"role":"user","content":"hi, who are you?"}],
"max_tokens": 2048,
"temperature": 0.7
}'
On Windows, PowerShell is nowadays the preferred CLI for scripting. It provides its own function to talk to an HTTP server.
Invoke-WebRequest -URI http://localhost:4891/v1/chat/completions -Method POST -ContentType application/json -Body '{
"model": "Phi-3 Mini Instruct",
"messages": [{"role":"user","content":"hi, who are you?"}],
"max_tokens": 2048,
"temperature": 0.7
}'
- GPT4All issues tagged with the
local-server
label and discussions, some of which are also tagged withlocal-server
- OpenAI general documentation:
- Python API client library:
- OpenAI API specification