Prevent max_tokens from also being written when making a call with OpenAI o-models #48218

dariusFTOS · 2025-02-11T11:57:17Z

Prevent max_tokens from being written and leave only max_completion_tokens when making a call with OpenAI o-models

Fix for #46545

Prevent max_tokens from also being written when making a call with OpenAI o-models

github-actions · 2025-02-11T11:57:38Z

Thank you for your contribution @dariusFTOS! We will review the pull request and get back to you soon.

azure-sdk · 2025-02-11T12:07:14Z

API change check

API changes are not detected in this pull request.

trrwilson · 2025-02-11T20:32:28Z

Thank you, @dariusFTOS! This uncovers a nasty out-of-the-box bug that we'll fix ASAP -- but in the interim I can share a workaround and background:

First, the background:

This has been a sticky problem since Azure OpenAI originally shipped with a model-specific distinction of which "max tokens" property to use; max_tokens was the only property that pre-o1 models allowed (max_completion_tokens would be rejected) and, conversely, max_completion_tokens was the mutually exclusive one accepted by o1*.

The good news is that this is planned for full parity alignment such that max_completion_tokens can fully replace max_tokens, following the OpenAI API in deprecating the older property. The bad news is that the convergence is still in progress, and while some older models like gpt-4o (2024-08-06) now accept max_completion_tokens on AOAI, some others like gpt-35-turbo (1106/0125) still reject it and require the older max_tokens. So while the OpenAI library can (and does) just invariantly use the current max_completion_tokens property for all models, Azure.AI.OpenAI needs to allow for deployment-specific customization.

That points out the trouble here: if we just never use "the other" name (newer or older, depending on model), it'll break support for its complement.

As part of the 2.1.0 release for 2024-10-21 support, we included a SetNewMaxCompletionTokensEnabled() extension method that allows the use of o1 and later models to "opt in" to the newer property choice. Unfortunately, while checking for this issue, I found our test coverage missed that calling this extension method before using the options in a method call results in an exception -- this is the bug we'll fix ASAP.

Now, for the interim workaround (applicable to 2.1.0 stable and 2.2.0-beta.1), what I'd recommend to get this working today, if reflection is permissible, is to initialize the internal property -- this will allow the intended method for "opt in" to immediately work as intended:

ChatCompletionOptions options = new()
{
    // This may need to be max_tokens or max_completion_tokens,
    // depending on the model deployment being used
    MaxOutputTokenCount = 16,
};

// Bug workaround pending release of a fix: initialize an internal property to allow
// the intended extension method to work with no prior method use of 'options'
options
    .GetType()
    .GetProperty("SerializedAdditionalRawData", BindingFlags.NonPublic | BindingFlags.Instance)!
    .SetValue(options, new Dictionary<string, BinaryData>());

// The intended method call that opts into use of the newer max_completion_tokens
if (deploymentIsOSeries)
{
    options.SetNewMaxCompletionTokensPropertyEnabled();
}

If reflection isn't an option, trying/catching a "fake warm-up call" to CompleteChat*() will also initialize this property, and it doesn't need to be a call to a valid endpoint/deployment:

// Alternate workaround: make a preliminary mock method call using 'options' to initialize the property
if (deploymentIsOSeries)
{
    try
    {
        ChatClient workaroundWarmupClient = aoaiClient.GetChatClient(deploymentName: "not_a_real_deployment");
        await workaroundWarmupClient.CompleteChatAsync(["Working around azure-sdk-for-net#46545"], options);
    }
    catch
    { }
    options.SetNewMaxCompletionTokensPropertyEnabled();
}

To summarize that state and next steps:

Today: The SetNewMaxCompletionTokensPropertyEnabled() extension method is needed to "opt into" the use of max_completion_tokens for AOAI o1/o3 models; further, one of the above workarounds (or equivalent) is needed to allow the extension method to function as intended
Soon: when the extension method bug is fixed, the extra code can go away and just the call to SetNewMaxCompletionTokensPropertyEnabled() can remain for opting into the new property name
When AOAI supports max_completion_tokens everywhere, the extension method call will no longer be necessary at all and setting MaxOutputTokenCount will "just work" everywhere, regardless of deployment (serialized as the newer max_completion_tokens)

Many apologies for the hassle here.

Update AzureChatClient.cs

2fc2ea6

Prevent max_tokens from also being written when making a call with OpenAI o-models

dariusFTOS requested review from jpalvarezl, trrwilson, joseharriaga, m-nash and ralph-msft as code owners February 11, 2025 11:57

github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. OpenAI labels Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent max_tokens from also being written when making a call with OpenAI o-models #48218

Prevent max_tokens from also being written when making a call with OpenAI o-models #48218

dariusFTOS commented Feb 11, 2025 •

edited

Loading

github-actions bot commented Feb 11, 2025

azure-sdk commented Feb 11, 2025

trrwilson commented Feb 11, 2025

Prevent max_tokens from also being written when making a call with OpenAI o-models #48218

Are you sure you want to change the base?

Prevent max_tokens from also being written when making a call with OpenAI o-models #48218

Conversation

dariusFTOS commented Feb 11, 2025 • edited Loading

github-actions bot commented Feb 11, 2025

azure-sdk commented Feb 11, 2025

trrwilson commented Feb 11, 2025

dariusFTOS commented Feb 11, 2025 •

edited

Loading