Releases · kakao/FunctionChat-Bench · GitHub

23 Oct 07:36

gannim

v1.1.1 Latest

Latest

Tool Call type 평가 시 Exact Match 누락 오류 해결
최종 점수 산정 시 'pass' counting 누락 사례 추가
출처 인용 정보 추가

Assets 2

24 Sep 01:43

gannim

v1.1.0

acceptable_arguments valid format으로 수정

acceptable_arguments 추가 (e.g. 홈베이킹 도구 + 홈베이킹, 기영이 결혼식 + 기영이, 결혼식, 170 + 170.0 등)

fix typos

Exact match 로직 오류 해결 (exact match가 아님에도 exact match pass로 처리되어 루브릭을 타지 않는 오류 케이스 해결)

LLM judge 모델인 gpt-4-0125-preview 를 위한 루브릭 개선

Full Changelog: v1.0.0...v1.1.0

Assets 2