A library for software engineering task evaluation
Task | Input | Output | Task Definition |
---|---|---|---|
Code Generation | A natural language description/comment on implmenting certain specification. | Code | Generate code for a given specification written in natural language. |
Code Search | A natural language description of code. | The code that matches the description. | Given a natural language, search for source code that matches the natural language. |
Task | Metric | Reference | If Integrated? |
---|---|---|---|
Code Generation | EM (Exact Match) | CodeXGLUE - Text2Code Generation | ✔️ |
BLEU | CodeXGLUE - Text2Code Generation | ✔️ | |
Code Search | MRR | CodeXGLUE -- Code Search (AdvTest) | ✔️ |
Task | Input | Output | Task Definition |
---|---|---|---|
Code Translation | A function of code in either C# or Java. | The function translated from Java to C# or vice-versa. | Translate the code from one programming language to another programming language. |
Code Repair | A Java function with bugs. | The refined function with no bugs. | Automatically refine code by fixing bugs. |
Code Completion | A chunk of Java or Python context code. | The predicted next token. | Predict subsequent tokens given the context of code. |
Task | Metric | Reference | If Integrated? |
---|---|---|---|
Code Translation | EM (Exact Match) | CodeXGLUE -- Code Translator | ✔️ |
BLEU | CodeXGLUE -- Code Translator | ✔️ | |
Code Repair | EM (Exact Match) | CodeXGLUE -- Code Refinement | ✔️ |
BLEU | CodeXGLUE -- Code Refinement | ✔️ | |
Code Completion | EM (Exact Match) | CodeXGLUE -- Code Completion (token level) | ✔️ |
Task | Input | Output | Task Definition |
---|---|---|---|
Code Summarization | Code | A natural language description of the code. | Generate natural language comments for code. |
Task | Metric | Reference | If Integrated? |
---|---|---|---|
Code Summarization | EM (Exact Match) | CodeXGLUE - Code-Text | ✔️ |
Task | Input | Output | Task Definition |
---|---|---|---|
Clone Detection | Two examples of code. | A binary classification of similar or not. | Measure the semantic similarity between codes. |
Bug/Defect Prediction - Binary | Code | A binary classification of defective or not. | Classify whether code contains defects that may be used to attack software systems. |
Bug/Vulnerability Type Prediction - Multi-class | Code | The type of a variable, parameter, or function. | Predict the correct type for a particular variable, parameter, or function. |
Task | Input | Output | Task Definition |
---|---|---|---|
Task | Metric | Reference | If Integrated? |
---|---|---|---|
Fault/Bug Localization | Paper with Replication Package |
Thank you for your interest in contributing! This document outlines the process for contributing to our project. Your contributions can make a real difference, and we appreciate every effort you make to help improve this project.
- Identify your target software engineering task (Unfamiliar with SE tasks? Find them here!)
You can either choose to integrate an existing evaluation technique or add a new evaluation technique.
Note, there could be evaluation tasks that are currently being worked on. Check the pull requests tab to see if a task is already in the works
- Integrate the evaluation method
Ensure that you have a detailed readme that describes how to use the evaluation method.
An example of an evaluation method and appropriate readme can be found here.
- Add a test script for you evaluation
In order to ensure the validity of the evaluation method, we require that you provide a test script as well.
There is a separate test folder that you must add your tests to. We also ask that you provide a 'how-to-test' section in your readme, detailing how to test the evaluation method.
An example test script can be found here.
Mitchell Huggins, please contact [email protected] if any questions about SEVAL.
- python 3.6 or 3.7
- numpy