Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent max_tokens from also being written when making a call with OpenAI o-models #48218

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dariusFTOS
Copy link

@dariusFTOS dariusFTOS commented Feb 11, 2025

Prevent max_tokens from being written and leave only max_completion_tokens when making a call with OpenAI o-models

Fix for #46545

Prevent max_tokens from also being written when making a call with OpenAI o-models
@github-actions github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. OpenAI labels Feb 11, 2025
Copy link

Thank you for your contribution @dariusFTOS! We will review the pull request and get back to you soon.

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@trrwilson
Copy link
Member

Thank you, @dariusFTOS! This uncovers a nasty out-of-the-box bug that we'll fix ASAP -- but in the interim I can share a workaround and background:

First, the background:

This has been a sticky problem since Azure OpenAI originally shipped with a model-specific distinction of which "max tokens" property to use; max_tokens was the only property that pre-o1 models allowed (max_completion_tokens would be rejected) and, conversely, max_completion_tokens was the mutually exclusive one accepted by o1*.

The good news is that this is planned for full parity alignment such that max_completion_tokens can fully replace max_tokens, following the OpenAI API in deprecating the older property. The bad news is that the convergence is still in progress, and while some older models like gpt-4o (2024-08-06) now accept max_completion_tokens on AOAI, some others like gpt-35-turbo (1106/0125) still reject it and require the older max_tokens. So while the OpenAI library can (and does) just invariantly use the current max_completion_tokens property for all models, Azure.AI.OpenAI needs to allow for deployment-specific customization.

That points out the trouble here: if we just never use "the other" name (newer or older, depending on model), it'll break support for its complement.

As part of the 2.1.0 release for 2024-10-21 support, we included a SetNewMaxCompletionTokensEnabled() extension method that allows the use of o1 and later models to "opt in" to the newer property choice. Unfortunately, while checking for this issue, I found our test coverage missed that calling this extension method before using the options in a method call results in an exception -- this is the bug we'll fix ASAP.

Now, for the interim workaround (applicable to 2.1.0 stable and 2.2.0-beta.1), what I'd recommend to get this working today, if reflection is permissible, is to initialize the internal property -- this will allow the intended method for "opt in" to immediately work as intended:

ChatCompletionOptions options = new()
{
    // This may need to be max_tokens or max_completion_tokens,
    // depending on the model deployment being used
    MaxOutputTokenCount = 16,
};

// Bug workaround pending release of a fix: initialize an internal property to allow
// the intended extension method to work with no prior method use of 'options'
options
    .GetType()
    .GetProperty("SerializedAdditionalRawData", BindingFlags.NonPublic | BindingFlags.Instance)!
    .SetValue(options, new Dictionary<string, BinaryData>());

// The intended method call that opts into use of the newer max_completion_tokens
if (deploymentIsOSeries)
{
    options.SetNewMaxCompletionTokensPropertyEnabled();
}

If reflection isn't an option, trying/catching a "fake warm-up call" to CompleteChat*() will also initialize this property, and it doesn't need to be a call to a valid endpoint/deployment:

// Alternate workaround: make a preliminary mock method call using 'options' to initialize the property
if (deploymentIsOSeries)
{
    try
    {
        ChatClient workaroundWarmupClient = aoaiClient.GetChatClient(deploymentName: "not_a_real_deployment");
        await workaroundWarmupClient.CompleteChatAsync(["Working around azure-sdk-for-net#46545"], options);
    }
    catch
    { }
    options.SetNewMaxCompletionTokensPropertyEnabled();
}

To summarize that state and next steps:

  • Today: The SetNewMaxCompletionTokensPropertyEnabled() extension method is needed to "opt into" the use of max_completion_tokens for AOAI o1/o3 models; further, one of the above workarounds (or equivalent) is needed to allow the extension method to function as intended
  • Soon: when the extension method bug is fixed, the extra code can go away and just the call to SetNewMaxCompletionTokensPropertyEnabled() can remain for opting into the new property name
  • When AOAI supports max_completion_tokens everywhere, the extension method call will no longer be necessary at all and setting MaxOutputTokenCount will "just work" everywhere, regardless of deployment (serialized as the newer max_completion_tokens)

Many apologies for the hassle here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. OpenAI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants