-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent max_tokens from also being written when making a call with OpenAI o-models #48218
base: main
Are you sure you want to change the base?
Conversation
Prevent max_tokens from also being written when making a call with OpenAI o-models
Thank you for your contribution @dariusFTOS! We will review the pull request and get back to you soon. |
API change check API changes are not detected in this pull request. |
Thank you, @dariusFTOS! This uncovers a nasty out-of-the-box bug that we'll fix ASAP -- but in the interim I can share a workaround and background: First, the background: This has been a sticky problem since Azure OpenAI originally shipped with a model-specific distinction of which "max tokens" property to use; The good news is that this is planned for full parity alignment such that That points out the trouble here: if we just never use "the other" name (newer or older, depending on model), it'll break support for its complement. As part of the 2.1.0 release for Now, for the interim workaround (applicable to 2.1.0 stable and 2.2.0-beta.1), what I'd recommend to get this working today, if reflection is permissible, is to initialize the internal property -- this will allow the intended method for "opt in" to immediately work as intended: ChatCompletionOptions options = new()
{
// This may need to be max_tokens or max_completion_tokens,
// depending on the model deployment being used
MaxOutputTokenCount = 16,
};
// Bug workaround pending release of a fix: initialize an internal property to allow
// the intended extension method to work with no prior method use of 'options'
options
.GetType()
.GetProperty("SerializedAdditionalRawData", BindingFlags.NonPublic | BindingFlags.Instance)!
.SetValue(options, new Dictionary<string, BinaryData>());
// The intended method call that opts into use of the newer max_completion_tokens
if (deploymentIsOSeries)
{
options.SetNewMaxCompletionTokensPropertyEnabled();
} If reflection isn't an option, trying/catching a "fake warm-up call" to // Alternate workaround: make a preliminary mock method call using 'options' to initialize the property
if (deploymentIsOSeries)
{
try
{
ChatClient workaroundWarmupClient = aoaiClient.GetChatClient(deploymentName: "not_a_real_deployment");
await workaroundWarmupClient.CompleteChatAsync(["Working around azure-sdk-for-net#46545"], options);
}
catch
{ }
options.SetNewMaxCompletionTokensPropertyEnabled();
} To summarize that state and next steps:
Many apologies for the hassle here. |
Prevent max_tokens from being written and leave only max_completion_tokens when making a call with OpenAI o-models
Fix for #46545