Add structured outputs support to Runpod API #238

endolith · 2025-11-15T03:58:05Z

Add support for structured outputs (JSON schema, regex, choice, grammar,
and structural_tag constraints) using vLLM 0.11.0+'s
StructuredOutputsParams API.

Changes:

Import StructuredOutputsParams from vllm.sampling_params
Extract structured_outputs from extra_body in sampling_params
Create StructuredOutputsParams instances for json, regex, choice,
grammar, and structural_tag constraint types
Reject deprecated guided_json parameter (from vLLM <0.11.0) with
clear error message pointing to migration guide

This enables users to enforce JSON schemas and other structured output
formats on model responses, as documented in vLLM 0.11.0.

https://docs.vllm.ai/en/v0.11.0/features/structured_outputs.html

Fixes #235

endolith · 2025-11-15T04:06:31Z

This was written by AI but seems to work, with the possible exception of the structural_tag. I need to check more carefully. It also doesn't enforce field order in json, which might be a vLLM issue.

endolith · 2025-11-17T16:08:11Z

with the possible exception of the structural_tag.

I think this was just a misunderstanding of the AI verifying the outputs? structural_tag only constrains text within the start and end tags, but allows arbitrary text outside of it?

I tested it with a bunch of contradictory prompts to prove that the output was actually being constrained to the schema:

There are some discrepancies, like:

In the "JSON schema with optional fields" example, the LLM doesn't seem to ever use the optional fields, even when specifically told to.
Also the "JSON object" mode seems to allow extraneous text outside the JSON object, which I'm not sure is as intended.

But those are probably all vLLM or Outlines/LMFE issues, not worker-vllm issues?

There may also be an issue with newlines not being escaped correctly in json output? I'm not sure if that's the fault of my script, or worker-vllm, or vLLM itself, or Outlines/LMFE.

If I go on the Runpod Requests tab and request

{
  "input": {
    "messages": [
      {
        "role": "system",
        "content": "IMPORTANT: When outputting JSON, use actual newlines within string values, not escaped \\n characters. For example, use:\n{\n  \"text\": \"This is line one\nThis is line two\nThis is line three\"\n}\nNever use \\n escapes in your JSON string values."
      },
      {
        "role": "user",
        "content": "Output a JSON object with a multi-line text field describing the Eiffel Tower"
      }
    ],
    "sampling_params": {
      "max_tokens": 200,
      "temperature": 0.1,
      "extra_body": {
        "structured_outputs": {
          "json": {
            "type": "object",
            "properties": {
              "description": {
                "type": "string",
                "description": "Multi-line description"
              },
              "year_built": {
                "type": "integer"
              }
            },
            "required": [
              "description",
              "year_built"
            ]
          }
        }
      }
    }
  }
}

it ignores the prompt and still escapes newlines:

{
  "delayTime": 61,
  "executionTime": 4479,
  "id": "7d5bacca-3d9f-4bc8-84bf-f1d44d078bd8-u2",
  "output": [
    {
      "choices": [
        {
          "tokens": [
            "{\n  \"description\": \"The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower.\\n\\nStanding at 330 meters tall, it was the tallest man-made structure in the world until the completion of the Chrysler Building in New York in 1930. The Eiffel Tower is now a global cultural icon of France and one of the most recognizable structures in the world.\\n\\nThe tower has three levels for visitors, with restaurants on the first and second levels. The top level's upper platform is 276 meters above the ground, accessible by staircase or elevator. The tower's design includes a series of open-lattice platforms, which allow wind to pass through and reduce the swaying of the structure.\\n\\"
          ]
        }
      ],
      "usage": {
        "input": 90,
        "output": 200
      }
    }
  ],
  "status": "COMPLETED",
  "workerId": "1ut1hgflhf242q"
}

so I'm not sure why I'm seeing unescaped newlines occasionally. I made tests for them and they all passed.

endolith · 2025-11-19T03:19:24Z

Actually, it seems Structured Outputs are already supported and working (somewhat...) when using the OpenAI API, and I confused that with the RunPod Job API.

I thought this was fixing something broken, but it's actually introducing something new, and tacking on OpenAI API concepts to a different API.

AI says:

What I proposed:
I added support for structured outputs to the RunPod Job API by accepting extra_body in sampling_params and converting it to StructuredOutputsParams. This is new functionality, not a bug fix.

Alternative approach (Option 2):
If structured outputs should be added to the RunPod Job API, a more appropriate format would match vLLM's native offline inference format:

"sampling_params": {
    "max_tokens": 500,
    "structured_outputs": {
        "json": {...}
    }
}

This would convert {"json": {...}} → StructuredOutputsParams(json={...}) directly, without borrowing the OpenAI-specific extra_body concept.

[Actually vLLM's implementation seems to be broken and doesn't always constrain outputs perfectly, but that's another issue.]

Add support for structured outputs (JSON schema, regex, choice, grammar, and structural_tag constraints) using vLLM 0.11.0+'s StructuredOutputsParams API. Changes: - Import StructuredOutputsParams from vllm.sampling_params - Extract structured_outputs from extra_body in sampling_params - Create StructuredOutputsParams instances for json, regex, choice, grammar, and structural_tag constraint types - Reject deprecated guided_json parameter (from vLLM <0.11.0) with clear error message pointing to migration guide This enables users to enforce JSON schemas and other structured output formats on model responses, as documented in vLLM 0.11.0. https://docs.vllm.ai/en/v0.11.0/features/structured_outputs.html Fixes runpod-workers#235

Previously, structured outputs support was incorrectly implemented by extracting OpenAI-specific `extra_body.structured_outputs` from the RunPod API's `sampling_params` dictionary. This mixed two different API patterns: 1. The OpenAI API uses `extra_body` as a catch-all for vLLM-specific parameters that aren't part of the OpenAI spec 2. The RunPod API uses `sampling_params` as a direct pass-through to vLLM's `SamplingParams`, where all parameters should be at the same level The RunPod API should be a direct 1:1 mapping to vLLM's offline inference API, not an abstraction layer. Previously, structured outputs support incorrectly mixed OpenAI-specific patterns (`extra_body`) into the RunPod API. Changes: - Remove OpenAI `extra_body` extraction from RunPod API path - The OpenAI route already handles structured outputs correctly through vLLM's OpenAI serving code - RunPod API should only use `sampling_params` like all other parameters - Simplify to direct extraction from `sampling_params.structured_outputs` - Convert dict to `StructuredOutputsParams(**config)` directly - Set on `SamplingParams` exactly as vLLM expects - Update README to emphasize direct mapping to vLLM's API The new implementation allows users to pass structured outputs exactly as they would in vLLM's Python API: ```json { "sampling_params": { "max_tokens": 128, "structured_outputs": { "json": {"type": "object", "properties": {...}}, "regex": "[A-Z]+", "choice": ["Positive", "Negative"] } } } ``` This directly maps to `SamplingParams(structured_outputs=StructuredOutputsParams(...))` and maintains consistency with vLLM's documented offline inference patterns. Users familiar with vLLM can use our API without learning new concepts. When vLLM adds new structured output features, they automatically work without code changes since we're just passing through the same parameters.

endolith · 2025-11-25T15:06:34Z

Changed it to map the offline inference interface directly instead of mixing OpenAI API into Runpod API

(and rebased onto #244)

endolith marked this pull request as draft November 19, 2025 03:14

endolith added 5 commits November 25, 2025 09:49

Fix broken README table by adding a header

e6f77cb

Use markdown linter on README

56de05a

Remove trailing whitespace

2189f72

endolith force-pushed the structured_outputs branch from 1324f0a to 42366ff Compare November 25, 2025 15:04

endolith changed the title ~~add structured outputs support~~ Add structured outputs support to Runpod API Nov 25, 2025

endolith marked this pull request as ready for review November 25, 2025 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add structured outputs support to Runpod API #238

Add structured outputs support to Runpod API #238

Uh oh!

endolith commented Nov 15, 2025

Uh oh!

endolith commented Nov 15, 2025 •

edited

Loading

Uh oh!

endolith commented Nov 17, 2025 •

edited

Loading

Uh oh!

endolith commented Nov 19, 2025 •

edited

Loading

Uh oh!

endolith commented Nov 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add structured outputs support to Runpod API #238

Are you sure you want to change the base?

Add structured outputs support to Runpod API #238

Uh oh!

Conversation

endolith commented Nov 15, 2025

Uh oh!

endolith commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

endolith commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

endolith commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

endolith commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

endolith commented Nov 15, 2025 •

edited

Loading

endolith commented Nov 17, 2025 •

edited

Loading

endolith commented Nov 19, 2025 •

edited

Loading

endolith commented Nov 25, 2025 •

edited

Loading