Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
exclude: build|stubs|^bot/templates/$|openassistant/templates|docs/docs/api/openapi.json|scripts/postprocessing/regex_pii_detector.py

default_language_version:
python: python3
python: python3.10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olliestanley This is what was causing pre-commit to fail on my machine. python3 is interpreted as python3.7, which is not new enough for some of the syntax, or isort. When I am more specific, like here, it works. Should this be examined some more?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this wouldn't be an issue if you ran the commands from inside a Python 3.10 virtual environment (I guess that's why others haven't had any similar issues) but I don't see any reason we can't make this config change if it helps make things easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on a rolling release Linux system with Python 3.10.10. It worked correctly inside a Ubuntu Docker container with the same Python version.. so it's probably some other dependency that broke this


ci:
autofix_prs: true
Expand Down
32 changes: 32 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,38 @@
"CUDA_VISIBLE_DEVICES": "1,2,3,4,5",
"OMP_NUM_THREADS": "1"
}
},
{
"name": "Debug: Inference Server",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/inference/server",
"remoteRoot": "/opt/inference/server"
}
],
"justMyCode": false
},
{
"name": "Debug: Worker",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5679
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the different ports for server and worker

},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/opt"
}
],
"justMyCode": false
}
]
}
5 changes: 5 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -231,12 +231,14 @@ services:
TRUSTED_CLIENT_KEYS: "6969"
ALLOW_DEBUG_AUTH: "True"
API_ROOT: "http://localhost:8000"
DEBUG: "True"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the compose file is only meant for local development, so setting this here shouldn't be a problem?

volumes:
- "./oasst-shared:/opt/inference/lib/oasst-shared"
- "./inference/server:/opt/inference/server"
restart: unless-stopped
ports:
- "8000:8000"
- "5678:5678" # Port to attach debugger
depends_on:
inference-redis:
condition: service_healthy
Expand All @@ -254,9 +256,12 @@ services:
MODEL_CONFIG_NAME: ${MODEL_CONFIG_NAME:-distilgpt2}
BACKEND_URL: "ws://inference-server:8000"
PARALLELISM: 2
DEBUG: "True"
volumes:
- "./oasst-shared:/opt/inference/lib/oasst-shared"
- "./inference/worker:/opt/inference/worker"
ports:
- "5679:5679" # Port to attach debugger
deploy:
replicas: 1
profiles: ["inference"]
Expand Down
4 changes: 2 additions & 2 deletions docker/inference/Dockerfile.server
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ USER ${APP_USER}
VOLUME [ "${APP_BASE}/lib/oasst-shared" ]
VOLUME [ "${APP_BASE}/lib/oasst-data" ]


CMD uvicorn main:app --reload --host 0.0.0.0 --port "${PORT}"
# In the dev image, we start uvicorn from Python so that we can attach the debugger
CMD python main.py



Expand Down
20 changes: 20 additions & 0 deletions inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,26 @@ python __main__.py
# You'll soon see a `User:` prompt, where you can type your prompts.
```

## Debugging

The inference server and the worker allow attaching a Python debugger. To do
this from VS Code, start the inference server & worker using docker compose as
described above (e.g. with `docker compose --profile inference up --build`),
then simply pick one of the following launch profiles, depending on what you
would like to debug:

- Debug: Inference Server
- Debug: Worker

### Waiting for Debugger on Startup

It can be helpful to wait for the debugger before starting the application. This
can be achieved by uncommenting `debugpy.wait_for_client()` in the appropriate
location:

- `inference/server/main.py` for the inference server
- `inference/worker/__main.py__` for the worker

## Distributed Testing

We run distributed load tests using the
Expand Down
18 changes: 18 additions & 0 deletions inference/server/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,21 @@ async def maybe_add_debug_api_keys():
async def welcome_message():
logger.warning("Inference server started")
logger.warning("To stop the server, press Ctrl+C")


if __name__ == "__main__":
import os

import uvicorn

port = int(os.getenv("PORT", "8000"))
is_debug = bool(os.getenv("DEBUG", "False"))

if is_debug:
import debugpy

debugpy.listen(("0.0.0.0", 5678))
# Uncomment to wait here until a debugger is attached
# debugpy.wait_for_client()

uvicorn.run("main:app", host="0.0.0.0", port=port, reload=is_debug)
Copy link
Contributor Author

@0xfacade 0xfacade Jul 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method of starting the server is only used for development - the docker image for production still invokes the uvicorn command. I could change that to also use python main.py instead for consistency, if desired.

1 change: 1 addition & 0 deletions inference/server/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ asyncpg
authlib
beautifulsoup4 # web_retriever plugin
cryptography==39.0.0
debugpy
fastapi-limiter
fastapi[all]==0.88.0
google-api-python-client
Expand Down
10 changes: 10 additions & 0 deletions inference/worker/__main__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import concurrent.futures
import os
import signal
import sys
import time
Expand Down Expand Up @@ -130,4 +131,13 @@ def main():


if __name__ == "__main__":
is_debug = bool(os.getenv("DEBUG", "False"))

if is_debug:
import debugpy

debugpy.listen(("0.0.0.0", 5679))
# Uncomment to wait here until a debugger is attached
# debugpy.wait_for_client()

main()
1 change: 1 addition & 0 deletions inference/worker/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
aiohttp
debugpy
hf_transfer
huggingface_hub
langchain==0.0.142
Expand Down