OPENIA: Correctness Assessment of Code Generated by Large Language Models Using Internal Representations

Introduction

In this paper, we introduce OPENIA, a novel white-box (open-box) framework that leverages these internal representations to assess the correctness of LLM-generated code. By systematically analyzing the intermediate states of representative open-source code LLMs, including DeepSeek-Coder, Code Llama, and Magicoder, across diverse code generation benchmarks, we found that these internal representations encode latent information, which strongly correlates with the correctness of the generated code.

Our results show that OPENIA consistently outperforms baseline models, achieving higher accuracy, precision, recall, and F1-Scores with up to a 2X improvement in standalone code generation and a 3X enhancement in repository-specific scenarios. By unlocking the potential of in-process signals, OPENIA paves the way for more proactive and efficient quality assurance mechanisms in LLM-assisted code generation.

Paper: https://arxiv.org/abs/2501.12934

The architecture

Project Overview

Quickstart

Download dataset

Download full repository here

Prepare Environment

First, we should set up a python environment. This code base has been tested under python 3.8.

$ conda create -n openia python=3.8
$ conda activate openia
$ pip install -r requirements.txt

Citation

If you're using RAMBO in your research or applications, please consider citing our paper:

@article{bui2025correctness,
  title={Correctness Assessment of Code Generated by Large Language Models Using Internal Representations},
  author={Bui, Tuan-Dung and Vu, Thanh Trong and Nguyen, Thu-Trang and Nguyen, Son and Vo, Hieu Dinh},
  journal={arXiv preprint arXiv:2501.12934},
  year={2025}
}

Contact us

If you have any questions, comments or suggestions, please do not hesitate to contact us.

Email: [email protected]

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benchmark		benchmark
dataeval		dataeval
datalabel		datalabel
extract_generation_code		extract_generation_code
figs		figs
func		func
models		models
pipeline		pipeline
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_settings.py		_settings.py
run_human_eval.sh		run_human_eval.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OPENIA: Correctness Assessment of Code Generated by Large Language Models Using Internal Representations

Introduction

Paper: https://arxiv.org/abs/2501.12934

The architecture

Project Overview

Quickstart

Download dataset

Prepare Environment

Citation

Contact us

License

About

Uh oh!

Releases

Packages

Languages

License

iSE-UET-VNU/OPENIA

Folders and files

Latest commit

History

Repository files navigation

OPENIA: Correctness Assessment of Code Generated by Large Language Models Using Internal Representations

Introduction

Paper: https://arxiv.org/abs/2501.12934

The architecture

Project Overview

Quickstart

Download dataset

Prepare Environment

Citation

Contact us

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages