Skip to content

Commit aeb8ffb

Browse files
authored
Merge pull request #6 from stratosphereips/harpo_llm_rpi5_mem_cpu_bench
add scripts for testing mem/tokens/s for rpi5 llm models
2 parents eb3c200 + 24e0f05 commit aeb8ffb

File tree

4 files changed

+230
-0
lines changed

4 files changed

+230
-0
lines changed

benchmark_models/README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Ollama Model Benchmark Script
2+
3+
This directory contains a Bash script for benchmarking models. The script automates the benchmarking of models served by a remote Ollama instance. It fetches the list of available models, runs a specified prompt against each model using a local Python script, and measures the model's performance. For each model, it collects:
4+
5+
Quantization level
6+
7+
Disk size (from /api/tags)
8+
9+
RAM usage while loaded (from /api/ps)
10+
11+
Tokens per second (TPS) from the Python output
12+
13+
The data is printed in a formatted table and also saved to a CSV file (results.csv) for further analysi
14+
15+
## Requirements
16+
17+
* Bash
18+
* `curl`
19+
* [`jq`](https://stedolan.github.io/jq/) for JSON parsing
20+
* Python script: `stream_query_llm.py` that supports:
21+
22+
* `--model`
23+
* `--prompt`
24+
* `--base_url`
25+
* `--stats_only` flag
26+
27+
## Files
28+
29+
* `benchmark_ollama_models.sh` — Main benchmarking script
30+
* `results.csv` — Output file containing the benchmark results
31+
* `stream_query_llm.py` — Your local script that streams responses and prints usage stats
32+
33+
## Collected Metrics
34+
35+
The script gathers and logs the following for each model:
36+
37+
| Metric | Source | Description |
38+
| ----------------- | ------------- | --------------------------------------------- |
39+
| Model name | /api/tags | Name of the model (e.g., llama3:8b) |
40+
| Quantization | /api/ps | Runtime quantization level (e.g., Q4\_K\_M) |
41+
| Disk size (MB) | /api/tags | Size of the GGUF model on disk |
42+
| RAM size (MB) | /api/ps | Actual loaded model size in memory |
43+
| Tokens per second | Python script | Measured performance (completion tokens/time) |
44+
45+
## Usage
46+
47+
1. Make the script executable:
48+
49+
```bash
50+
chmod +x benchmark_ollama_models.sh
51+
```
52+
53+
2. Run it:
54+
55+
```bash
56+
./benchmark_ollama_models.sh
57+
```
58+
59+
This will:
60+
61+
* Call each available model
62+
* Run a prompt
63+
* Log and print performance data
64+
* Save results to `results.csv`
65+
66+
## Configuration
67+
68+
Inside the script, you can customize:
69+
70+
* `OLLAMA_HOST`: IP or hostname of your Ollama server
71+
* `PORT`: Ollama server port (default is `11434`)
72+
* `PROMPT`: Prompt string to be used for benchmarking
73+
74+
## Notes
75+
76+
* Only models successfully loaded and used will return RAM size
77+
* If TPS extraction fails, the script will mark the entry as `ERROR`
78+
* The script assumes Ollama's REST API is accessible remotely
79+
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
#!/bin/bash
2+
3+
# --- Configuration ---
4+
OLLAMA_HOST="10.147.20.102"
5+
PORT="11434"
6+
BASE_URL="http://$OLLAMA_HOST:$PORT/v1"
7+
API_TAGS_URL="http://$OLLAMA_HOST:$PORT/api/tags"
8+
API_PS_URL="http://$OLLAMA_HOST:$PORT/api/ps"
9+
PYTHON_SCRIPT="stream_query_llm.py"
10+
PROMPT="generate a zeek script for detecting Suspicious HTTP User-Agents. Be concise."
11+
CSV_FILE="results.csv"
12+
13+
# --- Check prerequisites ---
14+
if [[ ! -f "$PYTHON_SCRIPT" ]]; then
15+
echo "❌ Python script '$PYTHON_SCRIPT' not found!"
16+
exit 1
17+
fi
18+
19+
if ! command -v jq &>/dev/null; then
20+
echo "❌ 'jq' is required but not installed."
21+
exit 1
22+
fi
23+
24+
# --- Fetch model list ---
25+
tags_response=$(curl -s "$API_TAGS_URL")
26+
if [[ -z "$tags_response" ]]; then
27+
echo "❌ Failed to fetch models from $API_TAGS_URL"
28+
exit 1
29+
fi
30+
31+
# --- CSV Output File ---
32+
echo "model,quantization,disk_size_mb,ram_size_mb,tokens_per_second" > "$CSV_FILE"
33+
34+
# --- Table Header ---
35+
echo -e "\n📊 Remote Model Benchmark Summary:"
36+
printf "%-25s %-12s %14s %14s %10s\n" "Model" "Quantization" "Disk Size (MB)" "RAM Size (MB)" "TPS"
37+
printf "%-25s %-12s %14s %14s %10s\n" "-----" "------------" "--------------" "-------------" "----"
38+
39+
# --- Loop through models ---
40+
models=$(echo "$tags_response" | jq -c '.models[]')
41+
42+
for model_json in $models; do
43+
model=$(echo "$model_json" | jq -r '.name')
44+
quant=$(echo "$model_json" | jq -r '.details.quantization_level // "N/A"')
45+
disk_bytes=$(echo "$model_json" | jq -r '.size // 0')
46+
disk_mb=$(awk "BEGIN {printf \"%.1f\", $disk_bytes/1024/1024}")
47+
48+
# Run benchmark
49+
output=$(python3 "$PYTHON_SCRIPT" \
50+
--prompt "$PROMPT" \
51+
--base_url "$BASE_URL" \
52+
--model "$model" \
53+
--stats_only 2>/dev/null)
54+
55+
# Extract TPS
56+
tps=$(echo "$output" | grep "Tokens per second" | awk '{print $(NF-1)}')
57+
display_tps="${tps:-ERROR}"
58+
csv_tps="${tps:-}"
59+
60+
# Query /api/ps for live RAM usage
61+
ps_response=$(curl -s "$API_PS_URL")
62+
ps_entry=$(echo "$ps_response" | jq -c --arg name "$model" '.models[] | select(.name == $name)')
63+
64+
ram_bytes=$(echo "$ps_entry" | jq -r '.size // 0')
65+
ram_mb=$(awk "BEGIN {printf \"%.1f\", $ram_bytes/1024/1024}")
66+
67+
# Print to terminal
68+
printf "%-25s %-12s %14s %14s %10s\n" "$model" "$quant" "$disk_mb" "$ram_mb" "$display_tps"
69+
70+
# Append to CSV
71+
echo "$model,$quant,$disk_mb,$ram_mb,$csv_tps" >> "$CSV_FILE"
72+
done
73+
74+
echo -e "\n✅ Results saved to: $CSV_FILE"
75+

benchmark_models/results.csv

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
model,quantization,disk_size_mb,ram_size_mb,tokens_per_second
2+
granite3.1-dense:2b,Q4_K_M,1497.0,2615.0,5.96
3+
llama3.2:3b,Q4_K_M,1925.8,3331.3,4.82
4+
smollm2:1.7b,Q8_0,1736.1,3946.4,5.13
5+
qwen2.5:1.5b,Q4_K_M,940.4,1495.0,9.87
6+
phi4-mini:latest,Q4_K_M,2376.4,3998.7,3.98
7+
gemma3:4b,Q4_K_M,3184.1,5231.4,3.90
8+
qwen2.5:3b,Q4_K_M,1840.5,2478.6,5.11
9+
gemma3:1b,Q4_K_M,777.5,1361.8,11.49
10+
deepseek-r1:1.5b,Q4_K_M,1065.6,1495.0,9.69
11+
llama3.2:1b,Q8_0,1259.9,2130.1,7.45
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
import openai
2+
import time
3+
import argparse
4+
5+
def stream_chat_with_usage(prompt, base_url,model,stats_only):
6+
# Custom OpenAI-compatible API endpoint
7+
client = openai.OpenAI(
8+
api_key="ollama", # Leave blank or use a value if your API requires it
9+
base_url=base_url
10+
)
11+
12+
messages = [{"role": "user", "content": prompt}]
13+
#model = "gpt-4" # Change this if your local model uses a different name
14+
15+
if not stats_only:
16+
print("AI:", end=" ", flush=True)
17+
full_reply = ""
18+
usage_info = None
19+
20+
response = client.chat.completions.create(
21+
model=model,
22+
messages=messages,
23+
stream=True,
24+
stream_options={"include_usage": True}
25+
)
26+
27+
start_time = time.time()
28+
for chunk in response:
29+
if chunk.choices and chunk.choices[0].delta.content:
30+
part = chunk.choices[0].delta.content
31+
if not stats_only:
32+
print(part, end="", flush=True)
33+
full_reply += part
34+
if hasattr(chunk, "usage") and chunk.usage:
35+
usage_info = chunk.usage
36+
37+
end_time = time.time()
38+
duration = end_time - start_time
39+
40+
print("\n\n🧠 Stats:")
41+
if usage_info:
42+
prompt_tokens = usage_info.prompt_tokens
43+
completion_tokens = usage_info.completion_tokens
44+
total_tokens = usage_info.total_tokens
45+
tps = completion_tokens / duration if duration > 0 else 0
46+
47+
print(f" Prompt tokens: {prompt_tokens}")
48+
print(f" Completion tokens: {completion_tokens}")
49+
print(f" Total tokens: {total_tokens}")
50+
print(f" Time taken: {duration:.2f} sec")
51+
print(f" Tokens per second: {tps:.2f} TPS")
52+
else:
53+
print(" Usage information not available.")
54+
55+
if __name__ == "__main__":
56+
parser = argparse.ArgumentParser(description="Stream chat completions with usage metrics.")
57+
parser.add_argument("--base_url", default="http://rpi5:18080/v1", help="Base URL of the OpenAI-compatible API")
58+
parser.add_argument("--prompt", required=True, help="Prompt to send to the model")
59+
parser.add_argument("--model", default="qwen2.5:1.5b", help="the model to use")
60+
parser.add_argument("--stats_only",action="store_true", help="Print only the stats, no prompt nor completion")
61+
62+
args = parser.parse_args()
63+
64+
stream_chat_with_usage(args.prompt, args.base_url,args.model, args.stats_only)
65+

0 commit comments

Comments
 (0)