LMKT: Add cultural knowledge remembering and cultural safety application #3736

martinakaduc · 2025-07-12T05:17:40Z

In this PR, I added two more scenarios to LMKT benchmark.

Cultural knowledge remembering: This scenario assesses the LLM's ability to remember culturally specific knowledge, i.e., knowledge in a language and not in other languages.
Cultural safety application: This scenario assesses the safety response of LLMs in different languages.
Cultural evolution understanding: This scenario is about understanding the appropriate cultural norms at a specific time.

Additionally, I fixed some naming convention for previous scenarios.

…t-dashboard

…mkt-en

…t-en

…tators

Polyguard eclektic benchmark

…mkt-en

src/helm/config/model_metadata.yaml

src/helm/benchmark/metrics/lmkt_eclektic_metrics.py

yifanmai · 2025-07-15T20:25:37Z

src/helm/benchmark/metrics/lmkt_eclektic_metrics.py

+
+        return [Stat(MetricName("accuracy")).add(scores["correct"])]
+
+    def evaluate(


This has a lot of copy and paste logic from the base evaluate() method. Instead of doing this, you should override evaluate_instances() from EvaluateInstancesMetric instead and put your logic there.

You can look at ClassificationMetric for an example of how to do this.

src/helm/benchmark/scenarios/lmkt_eclektic_scenario.py

src/helm/benchmark/metrics/lmkt_eclektic_metrics.py

src/helm/benchmark/scenarios/lmkt_eclektic_scenario.py

yifanmai · 2025-08-15T18:02:55Z

Hi, it's been a month since the last update; are you still working on this?

martinakaduc · 2025-08-18T14:45:25Z

Hi, it's been a month since the last update; are you still working on this?

Yes, I am still working on this project. There were many other things to do last month, so I haven't had time to revise it. I will restart this week.

martinakaduc and others added 29 commits May 7, 2025 21:45

MELT: Initialize dashboard

a61bfef

Merge branch 'main' of https://github.com/stanford-crfm/helm into mel…

23e586c

…t-dashboard

MELT: Update dashboard

f15fb3b

Merge branch 'main' into melt-dashboard

eee09c1

MELT: Update menu entry

4c1c4e4

MELT: Format landing page

5e28df7

MELT: Fix HTML bug

6720ab6

MELT: Fix bugs in schema

01cb366

MELT: Fix schema

1e42f57

LMKT: Initialize first two scenarios

e4512fc

Merge branch 'lmkt-en' of https://github.com/martinakaduc/helm into l…

f726f13

…mkt-en

LMKT: Fix minor bugs in the first two scenarios

50c27d8

Merge branch 'stanford-crfm:main' into lmkt-en

9f46d28

LMKT: Improve code based on Yifan's comments

0b025eb

Merge branch 'lmkt-en' of https://github.com/martinakaduc/helm into l…

1e45d55

…mkt-en

LMKT: Fix argument type

03cce5f

LMKT: Revise implementation according to Yifan's comments.

de2ff1f

LMKT: Add lmkt package to installization

fc485ce

feat: polyguard and eclektic benchmark

d7a8257

Merge branch 'main' of https://github.com/stanford-crfm/helm into lmk…

193f0ae

…t-en

Format code

ce202dc

Fix format

c7d45ff

Refactor social norm understanding scenario

344f761

fix: add polyguard-qwen in model deployments

d10b70d

fix: add revision and adjust max token in polyguard and eclektic anno…

3841765

…tators

fix: merge polyguard and eclektic to lmkt

c6fc2a4

Merge pull request #3 from martinakaduc/polyguard-eclektic

b679797

Polyguard eclektic benchmark

Merge branch 'lmkt-en' of https://github.com/martinakaduc/helm into l…

8f7c162

…mkt-en

Refactor name

01d5cb9

yifanmai requested changes Jul 17, 2025

View reviewed changes

martinakaduc added 3 commits September 2, 2025 09:50

Fix for Yifan's comments

46b14bd

Implement Cultural Evolution Understanding

583ce1e

Fix bugs for PolyGuard scenario

e8e9efc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LMKT: Add cultural knowledge remembering and cultural safety application #3736

LMKT: Add cultural knowledge remembering and cultural safety application #3736

Uh oh!

martinakaduc commented Jul 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

yifanmai Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yifanmai commented Aug 15, 2025

Uh oh!

martinakaduc commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		return [Stat(MetricName("accuracy")).add(scores["correct"])]

		def evaluate(

LMKT: Add cultural knowledge remembering and cultural safety application #3736

Are you sure you want to change the base?

LMKT: Add cultural knowledge remembering and cultural safety application #3736

Uh oh!

Conversation

martinakaduc commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yifanmai Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yifanmai commented Aug 15, 2025

Uh oh!

martinakaduc commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

martinakaduc commented Jul 12, 2025 •

edited

Loading