Skip to content

Audio Retrieval Dataset: JLCorpus #2927

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: maeb
Choose a base branch
from

Conversation

AdnanElAssadi56
Copy link

@AdnanElAssadi56 AdnanElAssadi56 commented Jul 21, 2025

Results on laion/clap-htsat-fused:

JLCorpusT2ARetrieval.json
JLCorpusA2TRetrieval.json

P.S This task could be problematic if evaluation is done on uniqueness of query-id rather than actual query.

class JLCorpusA2TRetrieval(AbsTaskAny2AnyRetrieval):
metadata = TaskMetadata(
name="JLCorpusA2TRetrieval",
description=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from a speech segment retrieve the emotion (e..g text label "angry"?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dataset contains transcriptions that have been said with different emotions. Supposedly, we should retrieve the transcription invariant to emotion. Maybe that can be the evaluation. Only problem here could be if evaluation is done with ids rather than on unique "text" columns.

Processed Dataset: https://huggingface.co/datasets/mteb/JL-Corpus_a2t
Original Dataset: https://huggingface.co/datasets/CLAPv2/JL-Corpus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so from the n versions of a text, it should retrieve the one that best matches the emotion. However, doesn't the task retrieve over all possible pairs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the way it is currently done (text column has no emotion info), I figured we should just see if audio segments are able to consistently retrieve the correct text column even if audio has varying emotions associated with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like something that is easier to test prior to merging. Seems like from the results that they can't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants