[FT] Support for retriever-augmented and latent-memory models.

## Issue encountered
LightEval currently evaluates models as `input -> output` text generators. However, an increasing class of retrieval-augmented models perform retrieval and reasoning within a _latent space_ or via tightly-coupled retriever-generator systems, which makes standard evaluation partially supported. 

### What kind of models can we support? 
1. **Compression-native / latent RAG** (e.g. Apple's CLaRa): documents are compressed into learned latent representations; retrieval and reasoning happen in continuous latent space. 
2. **Joint retriever-generator models** (e.g. RETRO, ATLAS): retrieval behavior materially affects generation but is not visible in standard evaluations. 
3. **Latent memory systems** (e.g. DSI): documents are stored and accessed implicitly via model parameters rather than text chunks. 

In these cases, comparison against classic RAG baselines becomes difficult. 

LightEval can already evaluate outputs from such systems, but it lacks a way to model these RAG systems as so, rather than as plain language models. 

## Solution/Feature
I don't have a fixed solution yet, rather I'm opening up an open discussion on the feasibility of this feature, for which I will take full initiative for, if it's viable. 

Upon searching the codebase, I found few files like `pipeline.py` or some driver code that supports classes like `LightevalModel`. Maybe we can create some form of model adapter that wraps a RAG system, that way we can use the existing benchmarks available within LightEval with these new systems. This is ofc theoretical thinking, but I would love your thoughts on this. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FT] Support for retriever-augmented and latent-memory models. #1109

Issue encountered

What kind of models can we support?

Solution/Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FT] Support for retriever-augmented and latent-memory models. #1109

Description

Issue encountered

What kind of models can we support?

Solution/Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions