Skip to content

Conversation

@martinakaduc
Copy link
Contributor

@martinakaduc martinakaduc commented Jul 12, 2025

In this PR, I added two more scenarios to LMKT benchmark.

  • Cultural knowledge remembering: This scenario assesses the LLM's ability to remember culturally specific knowledge, i.e., knowledge in a language and not in other languages.
  • Cultural safety application: This scenario assesses the safety response of LLMs in different languages.
  • Cultural evolution understanding: This scenario is about understanding the appropriate cultural norms at a specific time.

Additionally, I fixed some naming convention for previous scenarios.

martinakaduc and others added 29 commits May 7, 2025 21:45

return [Stat(MetricName("accuracy")).add(scores["correct"])]

def evaluate(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a lot of copy and paste logic from the base evaluate() method. Instead of doing this, you should override evaluate_instances() from EvaluateInstancesMetric instead and put your logic there.

You can look at ClassificationMetric for an example of how to do this.

@yifanmai
Copy link
Collaborator

Hi, it's been a month since the last update; are you still working on this?

@martinakaduc
Copy link
Contributor Author

Hi, it's been a month since the last update; are you still working on this?

Yes, I am still working on this project. There were many other things to do last month, so I haven't had time to revise it. I will restart this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants