Skip to content

Parameter passthrough for community sampler settings #7613

@blightbow

Description

@blightbow

Community authored samplers like MinP and XTC don't have officially documented request parameters in the Chat Completion spec. Other API implementations for local inference allow the configuration settings associated with these samplers to be passed through using unofficial parameters.

Is your feature request related to a problem? Please describe.
Testing samplers is a finicky business, and generally need to be tuned to the task. It makes more sense to let these be tuned by the client rather than hardcoded into the configuration file.

Describe the solution you'd like
I recommend we align to the parameter names used by Text Generation WebUI, as this maximizes interoperability with software that is aware of these sampler settings.

https://github.com/oobabooga/text-generation-webui/blob/main/extensions/openai/typing.py#L8

We can use this as our upstream reference and periodically update the frontend to pass these parameters through to our backend wrappers. Changes should be infrequent, and backend wrappers don't have to do anything special to benefit from this. They just hook the parameters they need if the inference backend actually supports them.

If the upstream reference dies or otherwise becomes outdated in this regard, we'll still be better positioned to track this ourselves in the future.

Describe alternatives you've considered

  • Kitchen sink approach: LocalAI backends get access to all request parameters seen by the API call. Backends can grab any request parameter that they are interested in without the frontend acting as a filter. Frontend focuses on officially supported parameters, backend wrappers extract any custom parameters that are supported by the integration's samplers. I would still recommend that all backends align to the parameter names used by TGWUI to maximize interoperability.

  • A more conservative alternative would be to let the configuration file whitelist custom parameters to be shared with the backend (avoids unintended disclosures). While this does follow the principle of least privilege, I'm also skeptical of the benefit; if we don't trust our own backends, we're doing something wrong. This also aligns the least with how other API servers are configured, pushing the cognitive burden down to the user and breaching principle of least astonishment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions