Skip to content

v1.4.1

Latest

Choose a tag to compare

@Yunnglin Yunnglin released this 05 Jan 07:28

中文版

基准测试数据集

  • 命名实体识别: 新增 12 个 NER(命名实体识别)数据集
  • 语音识别: 新增 TORGO 数据集,用于构音障碍语音识别评测,支持 SemScore 评估
  • 多模态评测: 新增 RefCOCO 基准测试
  • 代码评测: 新增 Terminal-bench 终端命令能力评测

功能增强

  • 性能测试: 新增 SLA 自动调优功能,优化性能测试体验
  • 服务模式: 新增异步服务支持和 Gradio UI 界面
  • 数据加载: 优化本地 JSONL 数据集加载功能

问题修复

  • 修复 HallusionBench 数据加载问题
  • 修复流式响应解析中的 SSE 分块处理问题

English Version

Benchmark Datasets

  • Named Entity Recognition: Added 12 NER (Named Entity Recognition) datasets
  • Speech Recognition: Added TORGO dataset for dysarthria speech recognition with SemScore evaluation
  • Multimodal Evaluation: Added RefCOCO referring expression comprehension benchmark
  • Code Evaluation: Added Terminal-bench for terminal command capability assessment

Feature Enhancements

  • Performance Testing: Added SLA auto-tuning functionality to optimize performance testing experience
  • Service Mode: Added asynchronous service support and Gradio UI interface
  • Data Loading: Optimized local JSONL dataset loading functionality

Bug Fixes

  • Fixed HallusionBench data loading issues
  • Fixed SemScore computation errors
  • Fixed eval_config loading related issues

What's Changed

New Contributors

Full Changelog: v1.4.0...v1.4.1