FVEval: Evaluating LLMs on Hardware Formal Verification

Installation

conda create -n fveval python=3.10
conda activate fveval
pip install -r requirements.txt
pip install -e .

Requirements

Running the full evaluation scripts and funcationalities implemented requires access to a commercial formal verification tool (Cadence Jasper). The code assumes the PATH variable is updated such that the Jasper binary is accessible with the command 'jg'

# check Cadence Jasper is accessible globally
jg -no_gui

Running Benchmark Data Generation and Preprocessing

# Run the following commands in your conda environment created above
# (1) run full flow for NL2SVA-Human
bash scripts/run_nl2sva_human.sh {k: number of ICL examples to use}

# (2) run full flow for NL2SVA-Machine
bash scripts/run_nl2sva_machine.sh {k: number of ICL examples to use}

# (3) run full flow for Design2SVA
bash scripts/run_design2sva.sh {n: number of outputs to sample, with which pass@k k<=n is evaluated}

Running Full Evaluation Suite for Each Sub-Benchmark

# Run the following commands in your conda environment created above
# You can supply a list of models to test with the --models flag, with model names ;-separated
# Run with --debug to print all input and outputs to and from the LM
# Change LLM decoding temperature with the --temperature flag
# You can also see the flag options available for each run script by passing the '-h' flag

# Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) benchmark:
python run_nl2sva.py --mode machine --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}

# Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) benchmark:
python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}

# Running LM inference on the Design2SVA (SV testbench generation) benchmark:
python run_design2sva.py --models "mixtral-8x22b"

Running LLM Generation Only on Each Sub-Benchmark

# Run the following commands in your conda environment created above
# You can supply a list of models to test with the --models flag, with model names ;-separated
# Run with --debug to print all input and outputs to and from the LM
# Change LLM decoding temperature with the --temperature flag
# You can also see the flag options available for each run script by passing the '-h' flag

# Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) task:
python run_nl2sva.py --mode machine --models "gpt-4" --num_icl 1

# Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) task:
python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl 3

# Running LM inference on the Design2SVA (SV testbench generation) task:
python run_design2sva.py --models "mixtral-8x22b"

Repository Structure

Overview of the repository:

fv_eval/
├── fv_eval/
│   ├── benchmark_launcher.py (methods for consuming input bmark data and run LM inference)
│   ├── evaluation.py (methods for LM response evaluation)
│   ├── fv_tool_execution.py (methods for launching FV tools, i.e. Cadence Jasper)
│   ├── data.py (definitions for input/output data)
│   ├── prompts_*.py  (default prompts for each subtask)
│   ├── utils.py (misc. util functions)
|
├── data_design2sva/
|   |── data/  
│   |── generate_pipelines_design2sva.py 
│   |── generate_fsm_design2sva.py 
|
├── data_nl2sva/
│   |── annotated_instructions_with_signals/ 
│   |── annotated_tb/ 
│   |── data/ 
│   |── machine_tb/ 
│   |── generate_nl2sva_human.py 
│   |── generate_nl2sva_machine.py 
|
├── tool_scripts/ 
|   ├── pec/ (property equivalence check script)
|   |   |── pec.tcle
│   ├── run_jg_design2sva.tcl
│   ├── run_jg_nl2sva_human.tcl
│   ├── run_jg_nl2sva_machine.tcl
|
├── run_evaluation.py
├── run_svagen_design2sva.py
├── run_svagen_nl2sva.py
|
├── setup.py
└── README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FVEval: Evaluating LLMs on Hardware Formal Verification

Installation

Requirements

Running Benchmark Data Generation and Preprocessing

Running Full Evaluation Suite for Each Sub-Benchmark

Running LLM Generation Only on Each Sub-Benchmark

Repository Structure

Licenses

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data_design2sva		data_design2sva
data_nl2sva		data_nl2sva
fv_eval		fv_eval
scripts		scripts
tool_scripts		tool_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_design2sva.py		run_design2sva.py
run_evaluation.py		run_evaluation.py
run_nl2sva.py		run_nl2sva.py
setup.py		setup.py

License

bespokelabsai/FVEval

Folders and files

Latest commit

History

Repository files navigation

FVEval: Evaluating LLMs on Hardware Formal Verification

Installation

Requirements

Running Benchmark Data Generation and Preprocessing

Running Full Evaluation Suite for Each Sub-Benchmark

Running LLM Generation Only on Each Sub-Benchmark

Repository Structure

Licenses

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages