Skip to content

Commit 8ac2664

Browse files
authored
Merge pull request #16 from stratosphereips/harpo_datasets
New dataset for finetuning LLMs for risk analysis and decision making
2 parents 41727fe + 177c7d0 commit 8ac2664

25 files changed

+4863
-13843
lines changed

.gitignore

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ coverage.xml
5656
*.pot
5757

5858
# Django stuff:
59-
*.log
59+
#*.log
6060
local_settings.py
6161
db.sqlite3
6262
db.sqlite3-journal
@@ -127,3 +127,13 @@ dmypy.json
127127

128128
# Pyre type checker
129129
.pyre/
130+
131+
# Intermediate LLM analysis files (regeneratable)
132+
alert_summary/datasets/*.cause_risk.*.json
133+
alert_summary/datasets/*.llm.*.json
134+
!alert_summary/datasets/*.llm.*.json.gz
135+
alert_summary/datasets/final_dataset_*.json
136+
alert_summary/my_dataset_*.llm.*.json
137+
alert_summary/results/
138+
.attic/
139+
alert_summary/.attic/

CLAUDE.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Repository Overview
6+
7+
Slips-tools is a collection of tools and scripts for testing and evaluating Slips (network security analysis). The repository contains five main components:
8+
9+
1. **Alert Summary Tools** (`alert_summary/`) - Slips Evidence Log DAG Generator with dual analysis modes (IP-based and per-analysis)
10+
2. **LLM Unit Testing** (`llm-unittest/`) - Promptfoo-based test suite for evaluating small language models on security-related tasks
11+
3. **Model Benchmarking** (`benchmark_models/`) - Performance benchmarking for Ollama-served models
12+
4. **Data Visualization** (`multi_line_chart_plotter/`) - CSV plotting utility for performance metrics
13+
5. **System Monitoring** (`rpi_temperature_logger/`) - Raspberry Pi temperature logging
14+
15+
## Key Commands
16+
17+
### Alert Summary Tools
18+
```bash
19+
# IP-based analysis (traditional mode)
20+
cd alert_summary/
21+
python3 slips_dag_generator.py sample_logs/test_data.log --all-ips --minimal --include-threat-level
22+
23+
# Per-analysis mode (alert-focused)
24+
python3 slips_dag_generator.py sample_logs/slips.log --per-analysis --compact
25+
26+
# LLM-enhanced analysis
27+
./analyze_slips_with_llm.sh sample_logs/slips.log --per-analysis --format minimal
28+
29+
# Dataset generation - Summarization workflow
30+
./sample_dataset.sh 100 my_dataset --seed 42
31+
./generate_dag_analysis.sh datasets/my_dataset.jsonl
32+
./generate_llm_analysis.sh datasets/my_dataset.jsonl --model gpt-4o-mini --group-events --behavior-analysis
33+
python3 correlate_incidents.py datasets/my_dataset.*.json --jsonl datasets/my_dataset.jsonl -o final_dataset.json
34+
35+
# Dataset generation - Cause & Risk workflow
36+
./generate_cause_risk_analysis.sh datasets/my_dataset.jsonl --model gpt-4o-mini --group-events
37+
python3 correlate_risks.py datasets/my_dataset.*.json --jsonl datasets/my_dataset.jsonl -o final_dataset_risk.json
38+
```
39+
40+
### LLM Unit Testing
41+
```bash
42+
# Run all test cases with Ollama backend
43+
cd llm-unittest/
44+
./run_tests.sh
45+
46+
# Run individual test case
47+
promptfoo eval -c 01_test_action_json_parsing.yaml --max-concurrency 3 --no-cache --providers file://providers/ex_provider.yaml
48+
49+
# View results
50+
promptfoo view
51+
```
52+
53+
### Model Benchmarking
54+
```bash
55+
# Benchmark all available Ollama models
56+
cd benchmark_models/
57+
./benchmark_ollama_models.sh
58+
59+
# Test single OpenAI-compatible endpoint
60+
./test_openai.sh
61+
```
62+
63+
### Data Visualization
64+
```bash
65+
# Install dependencies
66+
cd multi_line_chart_plotter/
67+
pip install -r requirements.txt
68+
69+
# Generate multi-line plot
70+
./plotter.py file1.csv file2.csv "Title" "X Label" "Y Label" output.png
71+
```
72+
73+
### Temperature Monitoring
74+
```bash
75+
# Log Raspberry Pi temperature (requires RPi)
76+
cd rpi_temperature_logger/
77+
python3 rpi_temperature_logger.py
78+
```
79+
80+
## Architecture
81+
82+
### LLM Testing Framework
83+
- **Test Cases**: YAML files defining prompts and expected outputs for various security tasks (JSON parsing, Zeek analysis, tool use)
84+
- **Providers**: Configuration for different model endpoints (Ollama, OpenAI-compatible APIs)
85+
- **Evaluation**: Uses Promptfoo framework for systematic model evaluation
86+
87+
### Benchmarking System
88+
- **stream_query_llm.py**: Core Python script for querying models and measuring performance metrics
89+
- **benchmark_ollama_models.sh**: Orchestrates benchmarking across multiple models, collecting disk usage, RAM usage, and tokens-per-second
90+
- **Results**: Outputs structured CSV data for analysis
91+
92+
### Provider Configuration
93+
Models are configured in `llm-unittest/providers/` with endpoints typically pointing to:
94+
- Ollama servers (e.g., `http://10.147.20.101:11434/v1`)
95+
- Custom model endpoints for specialized models like BitNet
96+
97+
## Test Categories
98+
99+
The LLM unit tests focus on security-relevant capabilities:
100+
- **Action JSON**: Parsing and understanding structured security actions
101+
- **Zeek Analysis**: Network traffic log analysis and signature generation
102+
- **Tool Use**: Integration with security tools and workflows
103+
- **Summarization**: Converting technical data into actionable insights
104+
105+
## Development Notes
106+
107+
- Promptfoo requires `npm install -g promptfoo`
108+
- Python dependencies are minimal (openai, pandas, matplotlib)
109+
- Shell scripts expect `jq` and `curl` for JSON processing
110+
- Default configurations point to specific IP addresses that may need updating for different environments
111+
112+
## Conda Environment Setup
113+
114+
- Always use conda environment for running projects
115+
- Activation command:
116+
- `source $HOME/miniconda3/etc/profile.d/conda.sh && conda activate agents`
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
# Network Event Cause & Risk Analysis Dataset for Slips IDS
2+
3+
## Table of Contents
4+
5+
- [1. Task Description](#1-task-description)
6+
- [2. Relationship to Summarization Workflow](#2-relationship-to-summarization-workflow)
7+
- [3. Dataset Generation Workflow](#3-dataset-generation-workflow)
8+
- [Workflow Overview](#workflow-overview)
9+
- [Stage 3: Multi-Model Cause & Risk Analysis](#stage-3-multi-model-cause--risk-analysis)
10+
- [Stage 4: Dataset Correlation](#stage-4-dataset-correlation)
11+
- [Dataset Structure](#dataset-structure)
12+
- [4. Use Cases and Applications](#4-use-cases-and-applications)
13+
14+
## 1. Task Description
15+
16+
Develop a dataset for **root cause analysis and risk assessment** of network security incidents from Slips IDS alerts. This complementary workflow focuses on structured security analysis rather than event summarization, providing:
17+
18+
1. **Cause Analysis** - Categorized incident attribution (Malicious Activity / Legitimate Activity / Misconfigurations)
19+
2. **Risk Assessment** - Structured evaluation (Risk Level / Business Impact / Investigation Priority)
20+
21+
**Target Deployment**: Same hardware constraints as [summarization workflow](DATASET_REPORT.md#2-limitations) (Raspberry Pi 5, 1.5B-3B parameter models).
22+
23+
## 2. Relationship to Summarization Workflow
24+
25+
Both workflows share identical **Stages 1-2** (incident sampling and DAG generation) but diverge in LLM analysis approach:
26+
27+
| Aspect | Summarization Workflow | Risk Analysis Workflow |
28+
|--------|------------------------|------------------------|
29+
| **Documentation** | [DATASET_REPORT.md](DATASET_REPORT.md) | This document |
30+
| **Detailed Guide** | [README_dataset_summary_workflow.md](README_dataset_summary_workflow.md) | [README_dataset_risk_workflow.md](README_dataset_risk_workflow.md) |
31+
| **Analysis Script** | `generate_llm_analysis.sh` | `generate_cause_risk_analysis.sh` |
32+
| **Correlation Script** | `correlate_incidents.py` | `correlate_risks.py` |
33+
| **Output Fields** | `summary` + `behavior_analysis` | `cause_analysis` + `risk_assessment` |
34+
| **LLM Prompts** | 2 per incident (event summarization + behavior patterns) | 2 per incident (cause attribution + risk scoring) |
35+
| **Primary Use Case** | Incident timeline reconstruction, behavior pattern identification | Root cause analysis, threat prioritization, SOC decision support |
36+
37+
**Recommendation**: Generate both datasets from the same sampled incidents to enable comparative analysis and multi-task model training.
38+
39+
## 3. Dataset Generation Workflow
40+
41+
### Workflow Overview
42+
43+
**Stages 1-2** (Sampling + DAG): See [DATASET_REPORT.md §3](DATASET_REPORT.md#3-dataset-generation-workflow) - identical to summarization workflow.
44+
45+
**Quick commands:**
46+
```bash
47+
# Stage 1: Sample 100 incidents
48+
./sample_dataset.sh 100 my_dataset --seed 42
49+
50+
# Stage 2: Generate DAG analysis
51+
./generate_dag_analysis.sh datasets/my_dataset.jsonl
52+
```
53+
54+
### Stage 3: Multi-Model Cause & Risk Analysis
55+
56+
Query LLMs with dual prompts for cause attribution and risk assessment:
57+
58+
```bash
59+
# GPT-4o-mini (recommended baseline)
60+
./generate_cause_risk_analysis.sh datasets/my_dataset.jsonl \
61+
--model gpt-4o-mini --group-events
62+
63+
# Qwen2.5:3b (target deployment model)
64+
./generate_cause_risk_analysis.sh datasets/my_dataset.jsonl \
65+
--model qwen2.5:3b \
66+
--base-url http://10.147.20.102:11434/v1 --group-events
67+
```
68+
69+
**Output Structure** (per incident):
70+
```json
71+
{
72+
"cause_analysis": "**Possible Causes:**\n\n**1. Malicious Activity:**\n• Port scanning indicates reconnaissance...\n\n**2. Legitimate Activity:**\n• Could be network monitoring tools...\n\n**3. Misconfigurations:**\n• Firewall allowing unrestricted scanning...\n\n**Conclusion:** Most likely malicious reconnaissance activity.",
73+
74+
"risk_assessment": "**Risk Level:** High\n\n**Justification:** Active scanning + C2 connections...\n\n**Business Impact:** Potential data breach or service disruption...\n\n**Likelihood of Malicious Activity:** High - Systematic attack pattern...\n\n**Investigation Priority:** Immediate - Block source IP and investigate."
75+
}
76+
```
77+
78+
### Stage 4: Dataset Correlation
79+
80+
Merge all analyses (DAG + LLM cause/risk assessments) by incident ID:
81+
82+
```bash
83+
python3 correlate_risks.py datasets/my_dataset.*.json \
84+
--jsonl datasets/my_dataset.jsonl \
85+
-o datasets/final_dataset_risk.json
86+
```
87+
88+
### Dataset Structure
89+
90+
Final output contains merged analyses with model-specific risk assessments:
91+
92+
```json
93+
{
94+
"total_incidents": 100,
95+
"incidents": [
96+
{
97+
"incident_id": "uuid",
98+
"category": "Malware",
99+
"source_ip": "192.168.1.113",
100+
"timewindow": "5",
101+
"timeline": "2024-04-05 16:53:07 to 16:53:50",
102+
"threat_level": 15.36,
103+
"event_count": 4604,
104+
"dag_analysis": "• 16:53 - 222 horizontal port scans [HIGH]\n...",
105+
"cause_risk_gpt_4o_mini": {
106+
"cause_analysis": "**1. Malicious Activity:** Reconnaissance scanning...",
107+
"risk_assessment": "**Risk Level:** High\n**Justification:**..."
108+
},
109+
"cause_risk_gpt_4o": { ... },
110+
"cause_risk_qwen2_5": { ... }
111+
}
112+
]
113+
}
114+
```
115+
116+
**Key differences from summarization dataset**:
117+
- `cause_risk_*` fields replace `llm_*` fields
118+
- Structured 3-category cause analysis (vs. free-form summary)
119+
- 5-field risk assessment framework (vs. behavior flow description)
120+
121+
## 4. Use Cases and Applications
122+
123+
### Security Operations Center (SOC)
124+
- **Automated Triage**: Risk level + investigation priority for alert queue sorting
125+
- **Incident Attribution**: Distinguish malicious attacks from misconfigurations
126+
- **Resource Allocation**: Business impact assessment for team assignments
127+
128+
### Model Training Applications
129+
- **Classification Tasks**: Train models to categorize incidents (malicious/legitimate/misconfiguration)
130+
- **Risk Scoring**: Fine-tune models for threat level prediction
131+
- **Decision Support**: Generate actionable recommendations (block/monitor/investigate)
132+
133+
### Dataset Comparison
134+
Use both workflows together:
135+
- **Summarization**: "What happened?" (temporal sequences, behavior patterns)
136+
- **Risk Analysis**: "Why did it happen?" + "How urgent?" (attribution, prioritization)
137+
138+
**Combined Training Strategy**:
139+
```bash
140+
# Generate both datasets from same incidents
141+
./generate_llm_analysis.sh datasets/my_dataset.jsonl --model qwen2.5:3b --group-events --behavior-analysis
142+
./generate_cause_risk_analysis.sh datasets/my_dataset.jsonl --model qwen2.5:3b --group-events
143+
144+
# Correlate separately
145+
python3 correlate_incidents.py datasets/my_dataset.*.json --jsonl datasets/my_dataset.jsonl -o summary_dataset.json
146+
python3 correlate_risks.py datasets/my_dataset.*.json --jsonl datasets/my_dataset.jsonl -o risk_dataset.json
147+
148+
# Multi-task training: Merge datasets and train single model on both tasks
149+
```
150+
151+
---
152+
153+
**For detailed implementation**: See [README_dataset_risk_workflow.md](README_dataset_risk_workflow.md)
154+
**For workflow comparison**: See [WORKFLOWS_OVERVIEW.md](WORKFLOWS_OVERVIEW.md) (if available)
155+
**For evaluation methods**: See [LLM_EVALUATION_GUIDE.md](LLM_EVALUATION_GUIDE.md)

0 commit comments

Comments
 (0)