中文版
基准测试数据集
- 命名实体识别: 新增 12 个 NER(命名实体识别)数据集
- 语音识别: 新增 TORGO 数据集,用于构音障碍语音识别评测,支持 SemScore 评估
- 多模态评测: 新增 RefCOCO 基准测试
- 代码评测: 新增 Terminal-bench 终端命令能力评测
功能增强
- 性能测试: 新增 SLA 自动调优功能,优化性能测试体验
- 服务模式: 新增异步服务支持和 Gradio UI 界面
- 数据加载: 优化本地 JSONL 数据集加载功能
问题修复
- 修复 HallusionBench 数据加载问题
- 修复流式响应解析中的 SSE 分块处理问题
English Version
Benchmark Datasets
- Named Entity Recognition: Added 12 NER (Named Entity Recognition) datasets
- Speech Recognition: Added TORGO dataset for dysarthria speech recognition with SemScore evaluation
- Multimodal Evaluation: Added RefCOCO referring expression comprehension benchmark
- Code Evaluation: Added Terminal-bench for terminal command capability assessment
Feature Enhancements
- Performance Testing: Added SLA auto-tuning functionality to optimize performance testing experience
- Service Mode: Added asynchronous service support and Gradio UI interface
- Data Loading: Optimized local JSONL dataset loading functionality
Bug Fixes
- Fixed HallusionBench data loading issues
- Fixed SemScore computation errors
- Fixed eval_config loading related issues
What's Changed
- [Fix] hallusion_bench load data by @Yunnglin in #1092
- [Feature] Add perf SLA auto tune by @Yunnglin in #1095
- [Feature] add service async and gradio ui by @Yunnglin in #1103
- fix(streaming): Robust parsing of SSE chunks with multiple events and \r\n normalization by @amumu96 in #1102
- Add 12 NER Datasets by @penguinwang96825 in #1106
- [Benchmark] Add TORGO Dataset for Dysarthria Speech Recognition with SemScore Evaluation by @penguinwang96825 in #1107
- [Benchmark] Add RefCOCO by @mushenL in #1109
- [Fix] computation error in SemScore by @penguinwang96825 in #1110
- [Feature] Update load local jsonl by @Yunnglin in #1111
- [Fix] eval_config load by @Yunnglin in #1116
- [Benchmark] Add terminal-bench by @Yunnglin in #1114
New Contributors
Full Changelog: v1.4.0...v1.4.1