Skip to content

bespokelabsai/FVEval

 
 

Repository files navigation

FVEval: Evaluating LLMs on Hardware Formal Verification

Installation

conda create -n fveval python=3.10
conda activate fveval
pip install -r requirements.txt
pip install -e .

Requirements

Running the full evaluation scripts and funcationalities implemented requires access to a commercial formal verification tool (Cadence Jasper). The code assumes the PATH variable is updated such that the Jasper binary is accessible with the command 'jg'

# check Cadence Jasper is accessible globally
jg -no_gui

Running Benchmark Data Generation and Preprocessing

# Run the following commands in your conda environment created above
# (1) run full flow for NL2SVA-Human
bash scripts/run_nl2sva_human.sh {k: number of ICL examples to use}

# (2) run full flow for NL2SVA-Machine
bash scripts/run_nl2sva_machine.sh {k: number of ICL examples to use}

# (3) run full flow for Design2SVA
bash scripts/run_design2sva.sh {n: number of outputs to sample, with which pass@k k<=n is evaluated}

Running Full Evaluation Suite for Each Sub-Benchmark

# Run the following commands in your conda environment created above
# You can supply a list of models to test with the --models flag, with model names ;-separated
# Run with --debug to print all input and outputs to and from the LM
# Change LLM decoding temperature with the --temperature flag
# You can also see the flag options available for each run script by passing the '-h' flag

# Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) benchmark:
python run_nl2sva.py --mode machine --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}

# Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) benchmark:
python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}

# Running LM inference on the Design2SVA (SV testbench generation) benchmark:
python run_design2sva.py --models "mixtral-8x22b" 

Running LLM Generation Only on Each Sub-Benchmark

# Run the following commands in your conda environment created above
# You can supply a list of models to test with the --models flag, with model names ;-separated
# Run with --debug to print all input and outputs to and from the LM
# Change LLM decoding temperature with the --temperature flag
# You can also see the flag options available for each run script by passing the '-h' flag

# Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) task:
python run_nl2sva.py --mode machine --models "gpt-4" --num_icl 1

# Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) task:
python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl 3

# Running LM inference on the Design2SVA (SV testbench generation) task:
python run_design2sva.py --models "mixtral-8x22b" 

Repository Structure

Overview of the repository:

fv_eval/
├── fv_eval/
│   ├── benchmark_launcher.py (methods for consuming input bmark data and run LM inference)
│   ├── evaluation.py (methods for LM response evaluation)
│   ├── fv_tool_execution.py (methods for launching FV tools, i.e. Cadence Jasper)
│   ├── data.py (definitions for input/output data)
│   ├── prompts_*.py  (default prompts for each subtask)
│   ├── utils.py (misc. util functions)
|
├── data_design2sva/
|   |── data/  
│   |── generate_pipelines_design2sva.py 
│   |── generate_fsm_design2sva.py 
|
├── data_nl2sva/
│   |── annotated_instructions_with_signals/ 
│   |── annotated_tb/ 
│   |── data/ 
│   |── machine_tb/ 
│   |── generate_nl2sva_human.py 
│   |── generate_nl2sva_machine.py 
|
├── tool_scripts/ 
|   ├── pec/ (property equivalence check script)
|   |   |── pec.tcle
│   ├── run_jg_design2sva.tcl
│   ├── run_jg_nl2sva_human.tcl
│   ├── run_jg_nl2sva_machine.tcl
|
├── run_evaluation.py
├── run_svagen_design2sva.py
├── run_svagen_nl2sva.py
|
├── setup.py
└── README.md


Licenses

Copyright © 2024, NVIDIA Corporation. All rights reserved. This work is made available under the Apache 2.0 License.

About

LLM Evaluation Benchmark on Hardware Formal Verification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 75.4%
  • SystemVerilog 19.8%
  • Tcl 2.7%
  • Shell 2.1%