Homepage of StructEval, a benchmark dataset designed to evaluate the ability of large language models (LLMs) to generate and convert structured outputs across 18 different formats and 44 types of tasks. It includes both renderable types (e.g., HTML, LaTeX, SVG) and non-renderable types (e.g., JSON, XML, TOML), supporting tasks such as format generation from natural language prompts and format-to-format conversion.
This website is adapted from Nerfies website and MathVista.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.