Skip to content

Commit a9fbf0d

Browse files
authored
Merge pull request #3 from stratosphereips/promptfoo-json-evaluation
Promptfoo json evaluation
2 parents 102297f + 3590442 commit a9fbf0d

18 files changed

+1038
-2100
lines changed

llm-unittest/01_test_action_json_parsing.yaml

Lines changed: 1 addition & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,7 @@
11
description: Field Extraction from Networking JSON
22

33
providers:
4-
- id: ollama:smollm2:1.7b
5-
config:
6-
num_predict: 2048
7-
- id: ollama:gemma3:1b
8-
config:
9-
num_predict: 2048
10-
- id: ollama:llama3.2:1b
11-
config:
12-
num_predict: 2048
13-
- id: ollama:llama3.2:3b
14-
config:
15-
num_predict: 2048
16-
- id: ollama:phi4-mini:latest
17-
config:
18-
num_predict: 2048
19-
- id: ollama:qwen2.5:1.5b
20-
config:
21-
num_predict: 2048
22-
- id: ollama:qwen2.5:3b
23-
config:
24-
num_predict: 2048
25-
- id: ollama:granite3.1-dense:2b
26-
config:
27-
num_predict: 2048
28-
4+
- file://providers/providers.yaml
295

306
prompts:
317
- |

llm-unittest/02_test_action_json_understanding.yaml

Lines changed: 3 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,8 @@
11
description: Summarize Networking Actions from JSON
22

33
providers:
4-
- id: ollama:smollm2:1.7b
5-
config:
6-
num_predict: 2048
7-
- id: ollama:gemma3:1b
8-
config:
9-
num_predict: 2048
10-
- id: ollama:llama3.2:1b
11-
config:
12-
num_predict: 2048
13-
- id: ollama:llama3.2:3b
14-
config:
15-
num_predict: 2048
16-
- id: ollama:phi4-mini:latest
17-
config:
18-
num_predict: 2048
19-
- id: ollama:qwen2.5:1.5b
20-
config:
21-
num_predict: 2048
22-
- id: ollama:qwen2.5:3b
23-
config:
24-
num_predict: 2048
25-
- id: ollama:granite3.1-dense:2b
26-
config:
27-
num_predict: 2048
28-
4+
- file://providers/providers.yaml
5+
296
prompts:
307
- |
318
Given the following JSON, describe the networking action in simple English:
@@ -67,5 +44,5 @@ tests:
6744

6845
defaultTest:
6946
options:
70-
provider: ollama:qwen2.5-coder:latest
47+
provider: openai:chat:qwen2.5-coder:latest
7148

llm-unittest/03_test_action_json_w_parameters.yaml

Lines changed: 1 addition & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,7 @@
11
description: Generate Structured Networking JSON
22

33
providers:
4-
- id: ollama:smollm2:1.7b
5-
config:
6-
num_predict: 2048
7-
- id: ollama:gemma3:1b
8-
config:
9-
num_predict: 2048
10-
- id: ollama:llama3.2:1b
11-
config:
12-
num_predict: 2048
13-
- id: ollama:llama3.2:3b
14-
config:
15-
num_predict: 2048
16-
- id: ollama:phi4-mini:latest
17-
config:
18-
num_predict: 2048
19-
- id: ollama:qwen2.5:1.5b
20-
config:
21-
num_predict: 2048
22-
- id: ollama:qwen2.5:3b
23-
config:
24-
num_predict: 2048
25-
- id: ollama:granite3.1-dense:2b
26-
config:
27-
num_predict: 2048
28-
29-
30-
4+
- file://providers/providers.yaml
315

326
prompts:
337
- "Output a raw JSON object (without backticks or formatting) that represents the following networking action: {{action}}. The JSON should contain the keys `action` and `parameters`, and no additional text or formatting. Respect the following format: {{example}}"

llm-unittest/04_test_action_json.yaml

Lines changed: 1 addition & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,7 @@
11
description: Test the generation of valid JSON actions with correct structure.
22

33
providers:
4-
- id: ollama:smollm2:1.7b
5-
config:
6-
num_predict: 2048
7-
- id: ollama:gemma3:1b
8-
config:
9-
num_predict: 2048
10-
- id: ollama:llama3.2:1b
11-
config:
12-
num_predict: 2048
13-
- id: ollama:llama3.2:3b
14-
config:
15-
num_predict: 2048
16-
- id: ollama:phi4-mini:latest
17-
config:
18-
num_predict: 2048
19-
- id: ollama:qwen2.5:1.5b
20-
config:
21-
num_predict: 2048
22-
- id: ollama:qwen2.5:3b
23-
config:
24-
num_predict: 2048
25-
- id: ollama:granite3.1-dense:2b
26-
config:
27-
num_predict: 2048
28-
4+
- file://providers/providers.yaml
295
prompts:
306
- "Output a raw JSON object (without backticks or formatting) that represents the following networking action: {{action}}. The JSON should contain the keys `action` and `parameters`, and no additional text or formatting. Respect the following format: {{example}}"
317

llm-unittest/05_test_zeek_analysis.yaml

Lines changed: 2 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,7 @@
11
description: Interpret Zeek Log Entries
22

33
providers:
4-
- id: ollama:smollm2:1.7b
5-
config:
6-
num_predict: 2048
7-
- id: ollama:gemma3:1b
8-
config:
9-
num_predict: 2048
10-
- id: ollama:llama3.2:1b
11-
config:
12-
num_predict: 2048
13-
- id: ollama:llama3.2:3b
14-
config:
15-
num_predict: 2048
16-
- id: ollama:phi4-mini:latest
17-
config:
18-
num_predict: 2048
19-
- id: ollama:qwen2.5:1.5b
20-
config:
21-
num_predict: 2048
22-
- id: ollama:qwen2.5:3b
23-
config:
24-
num_predict: 2048
25-
- id: ollama:granite3.1-dense:2b
26-
config:
27-
num_predict: 2048
4+
- file://providers/providers.yaml
285

296

307
prompts:
@@ -96,5 +73,5 @@ tests:
9673

9774
defaultTest:
9875
options:
99-
provider: ollama:qwen2.5-coder:latest
76+
provider: openai:chat:qwen2.5-coder:latest
10077

llm-unittest/06_test_zeek_generation.yaml

Lines changed: 1 addition & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,7 @@
11
description: Generate Valid Zeek Log Line
22

33
providers:
4-
- id: ollama:smollm2:1.7b
5-
config:
6-
num_predict: 2048
7-
- id: ollama:gemma3:1b
8-
config:
9-
num_predict: 2048
10-
- id: ollama:llama3.2:1b
11-
config:
12-
num_predict: 2048
13-
- id: ollama:llama3.2:3b
14-
config:
15-
num_predict: 2048
16-
- id: ollama:phi4-mini:latest
17-
config:
18-
num_predict: 2048
19-
- id: ollama:qwen2.5:1.5b
20-
config:
21-
num_predict: 2048
22-
- id: ollama:qwen2.5:3b
23-
config:
24-
num_predict: 2048
25-
- id: ollama:granite3.1-dense:2b
26-
config:
27-
num_predict: 2048
28-
4+
- file://providers/providers.yaml
295

306
prompts:
317
- 'Generate a simple Zeek log trace for a short network session. Include fields like ts, uid, id.orig_h, id.resp_h, proto, and service. The log should represent a single {{service}} request from {{src_ip}} to {{dst_ip}} using {{proto}}. No extra formatting or explanation.
Lines changed: 2 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,7 @@
11
description: Summarize Zeek Logs and Make Classifications
22

33
providers:
4-
- id: ollama:smollm2:1.7b
5-
config:
6-
num_predict: 2048
7-
- id: ollama:gemma3:1b
8-
config:
9-
num_predict: 2048
10-
- id: ollama:llama3.2:1b
11-
config:
12-
num_predict: 2048
13-
- id: ollama:llama3.2:3b
14-
config:
15-
num_predict: 2048
16-
- id: ollama:phi4-mini:latest
17-
config:
18-
num_predict: 2048
19-
- id: ollama:qwen2.5:1.5b
20-
config:
21-
num_predict: 2048
22-
- id: ollama:qwen2.5:3b
23-
config:
24-
num_predict: 2048
25-
- id: ollama:granite3.1-dense:2b
26-
config:
27-
num_predict: 2048
4+
- file://providers/providers.yaml
285
prompts:
296
- file://prompts/zeek_summary_personality.json
307

@@ -56,4 +33,4 @@ tests:
5633

5734
defaultTest:
5835
options:
59-
provider: ollama:qwen2.5-coder:latest
36+
provider: openai:chat:qwen2.5-coder:latest

llm-unittest/08_test_tool_use.yaml

Lines changed: 2 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,7 @@
11
description: Generate Function Call from Prompt
22

33
providers:
4-
- id: ollama:smollm2:1.7b
5-
config:
6-
num_predict: 2048
7-
- id: ollama:gemma3:1b
8-
config:
9-
num_predict: 2048
10-
- id: ollama:llama3.2:1b
11-
config:
12-
num_predict: 2048
13-
- id: ollama:llama3.2:3b
14-
config:
15-
num_predict: 2048
16-
- id: ollama:phi4-mini:latest
17-
config:
18-
num_predict: 2048
19-
- id: ollama:qwen2.5:1.5b
20-
config:
21-
num_predict: 2048
22-
- id: ollama:qwen2.5:3b
23-
config:
24-
num_predict: 2048
25-
- id: ollama:granite3.1-dense:2b
26-
config:
27-
num_predict: 2048
4+
- file://providers/providers.yaml
285
prompts:
296
- file://prompts/tool_use_personality.json
307
tests:
@@ -42,4 +19,4 @@ tests:
4219

4320
defaultTest:
4421
options:
45-
provider: ollama:qwen2.5-coder:latest
22+
provider: openai:chat:qwen2.5-coder:latest

0 commit comments

Comments
 (0)