thu-coai

All

106 repositories

MIR-SafetyBench
Public
MIR-SafetyBench: Evaluating Multi-image Reasoning Safety of Multimodal Large Language Models
0•0•1•0•Updated Jan 19, 2026Jan 19, 2026
JPS
Public
[MM'25] JPS: Jailbreak Multimodal Large Language Models with Collaborative Visual Perturbation and Textual Steering
Python
•3•15•2•0•Updated Dec 23, 2025Dec 23, 2025
IF-CRITIC
Public
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation
Shell
•0•6•1•0•Updated Nov 14, 2025Nov 14, 2025
Glyph
Public
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
Python
•50•553•6•0•Updated Nov 4, 2025Nov 4, 2025
CROPI
Public
Official Repository for for paper "Data-Efficient RLVR via Off-Policy Influence Guidance"
0•14•0•0•Updated Nov 1, 2025Nov 1, 2025
SocialEval
Public
[ACL'25] SocialEval: Evaluating Social Intelligence of Large Language Models
Python
•
MIT License
•0•9•0•0•Updated Oct 20, 2025Oct 20, 2025
CogFlow
Public
Think Socially via Cognitive Reasoning
Python
•1•6•0•0•Updated Oct 2, 2025Oct 2, 2025
CharacterGLM-6B
Public
[EMNLP'24] CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
Python
•
Apache License 2.0
•37•490•4•0•Updated Oct 2, 2025Oct 2, 2025
Crisp
Public
[EMNLP'25] Crisp: Cognitive Restructuring of Negative Thoughts through Multi-turn Supportive Dialogues
Python
•0•11•1•0•Updated Sep 2, 2025Sep 2, 2025
AISafetyLab
Public
AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.
Python
•
MIT License
•14•226•0•0•Updated Aug 29, 2025Aug 29, 2025
LRM-Safety-Study
Public
Python
•
MIT License
•0•4•0•0•Updated Aug 15, 2025Aug 15, 2025
Agent-SafetyBench
Public
Python
•
MIT License
•4•89•1•0•Updated Aug 11, 2025Aug 11, 2025
CharacterBench
Public
[AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models
Python
•1•19•0•0•Updated Aug 1, 2025Aug 1, 2025
ShieldVLM
Public
Python
•0•6•0•0•Updated Jul 31, 2025Jul 31, 2025
SafetyBench
Public
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
Python
•
MIT License
•12•268•0•0•Updated Jul 28, 2025Jul 28, 2025
VPO
Public
Python
•
Apache License 2.0
•1•22•2•0•Updated Jul 20, 2025Jul 20, 2025
LongSafety
Public
[ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models
Python
•
MIT License
•0•15•0•0•Updated Jun 18, 2025Jun 18, 2025
SPaR
Public
Python
•
Apache License 2.0
•3•46•0•0•Updated Jun 11, 2025Jun 11, 2025
TransferAttack
Public
[ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
Python
•1•17•0•0•Updated May 23, 2025May 23, 2025
HPSS
Public
HPSS: Heuristic Prompting Strategy Search for LLM Evaluators (ACL 2025 Findings)
Python
•1•3•0•0•Updated May 23, 2025May 23, 2025
Backdoor-Data-Extraction
Public
Python
•
MIT License
•6•29•1•0•Updated May 22, 2025May 22, 2025
BARREL
Public
Python
•
MIT License
•1•16•0•0•Updated May 21, 2025May 21, 2025
MAPS
Public
Official Implementation of ICLR25 paper "MAPS: Advancing Multi-modal Reasoning in Expert-level Physical Science"
Python
•2•8•0•0•Updated Mar 12, 2025Mar 12, 2025
ComplexBench
Public
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
Python
•
MIT License
•12•100•5•0•Updated Feb 20, 2025Feb 20, 2025
MiniPLM
Public
[ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models
Python
•
MIT License
•9•71•4•0•Updated Nov 23, 2024Nov 23, 2024
MoralStory
Public
Python
•1•17•1•0•Updated Nov 7, 2024Nov 7, 2024
OpenMEVA
Public
Benchmark for evaluating open-ended generation
benchmark evaluation-metrics language-generation
Python
•7•50•3•1•Updated Nov 6, 2024Nov 6, 2024
CodePlan
Public
2•16•1•0•Updated Oct 16, 2024Oct 16, 2024
ShieldLM
Public
ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]
Python
•
MIT License
•10•221•1•0•Updated Sep 29, 2024Sep 29, 2024
PICL
Public
Code for ACL2023 paper: Pre-Training to Learn in Context
Python
•
MIT License
•4•106•1•1•Updated Jul 26, 2024Jul 26, 2024