JHARS (Japanese Hallucination Assessment in RAG Settings)

JHARS is a comprehensive benchmark dataset for evaluating hallucinations (the phenomenon of generating content not present in given information sources) in Japanese Large Language Models (LLMs) in Retrieval-Augmented Generation (RAG) settings.

Overview

Hallucination is a critical challenge in the practical application of Large Language Models. JHARS was developed to quantitatively evaluate and understand the characteristics of hallucinations in Japanese LLMs.

Key Features

Sentence level annotations on 450 Japanese LLM responses in RAG settings
Evaluation of multiple state-of-the-art models (including GPT-4o)
Performance assessment of hallucination detection methods

Dataset

This dataset includes:

450 annotated LLM responses
Scripts for hallucination evaluation

Key Findings

Relatively low hallucination rate in LLM responses
Evidence of critical hallucinations that warrant fact-checking
Difficulty in achieving both high precision and recall in automatic detection
High recall possible for critical hallucinations

Dataset Structure

{
    "id": number,            // Unique identifier for each QA pair
    "question": string,      // Question in Japanese
    "reference_text": string,// Reference text used to generate answer
    "[model_name]": {        // Model response object (gpt-4o, gpt-4o-mini, Llama-3.1-Swallow-8B-Instruct-v0.1)
        "response": string,  // Model's answer in Japanese
        "annotations": {     // Annotation data
            "aggregated": {  // Consensus from multiple annotators
                "is_valid_answer": boolean,     // Whether response is valid
                "sentence_annotations": [        // Array of sentence-level annotations
                    {
                        "sentence": string,      // Target sentence
                        "annotation_status": string,     // e.g., "completed"
                        "hallucination_type": string,    // "No_hallucination", "Contradictory", "Unverifiable" 
                        "hallucination_text": string[],  // Identified hallucination text
                        "hallucination_text_start_offset": number[], // Start positions of hallucination text
                        "hallucination_text_end_offset": number[],   // End positions of hallucination text
                        "verification_uncertainty_reason": string[][], // Reasons for verification uncertainty
                        "contradiction_uncertainty_reason": string[][], // Reasons for contradiction uncertainty
                        "agreement_status": string       // "unanimous", "majority", "disputed"
                    }
                ]
            }
        }
    }
}

Note on hallucination types:

"No_hallucination": No hallucination detected
"Contradictory": Intrinsic hallucination - content that contradicts the reference text
"Contradictory_uncertain": Annotator uncertain about contradiction with reference
"Unverifiable": Extrinsic hallucination - content that cannot be verified using the reference text
"Unverifiable_uncertain": Annotator uncertain about verifiability from reference

The dataset includes two evaluation settings for hallucination types:

Relaxed setting: Treats uncertain cases as their base types
- "Contradictory_uncertain" → "Contradictory"
- "Unverifiable_uncertain" → "Unverifiable"
Strict setting: Maintains distinction between all five hallucination types listed above

Usage

# Clone the repository
git clone https://github.com/cl-tohoku/JHARS.git
cd JHARS

import pandas as pd

# Load and analyze JHARS dataset
df = pd.read_json('data/sentence_annotation/annotated_data_relaxed.jsonl', lines=True)
print(df.head())

License

This project is licensed under Apache License 2.0. See the LICENSE file for details.

Acknowledgments

This work was supported through a research collaboration between AI Shift Inc. and Tohoku University.

Contact

Research inquiries:
- ryohei.kamei.s4 at dc.tohoku.ac.jp
- sakata.masaki.s5 at dc.tohoku.ac.jp
Bug reports & feature requests: GitHub Issues

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data/sentence_annotation		data/sentence_annotation
script		script
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JHARS (Japanese Hallucination Assessment in RAG Settings)

Overview

Key Features

Dataset

Key Findings

Dataset Structure

Usage

License

Acknowledgments

Contact

About

Releases

Packages

License

cl-tohoku/JHARS

Folders and files

Latest commit

History

Repository files navigation

JHARS (Japanese Hallucination Assessment in RAG Settings)

Overview

Key Features

Dataset

Key Findings

Dataset Structure

Usage

License

Acknowledgments

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages