Structure completion request to maximize Prompt Caching #42805

brandonh-msft · 2024-11-05T18:34:31Z

Today, the current flow of a request through to an OpenAI service relies on simple JSON-serialization of a model to encode the message to BinaryData and send it through the pipeline.

This does not maximize Prompt Caching capabilities, where the completion request should have tools, then history, then new content - in that order.
Additionally, the tools and history must be in the same order every time (suggest alpha order by tool name).

Sources:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
https://openai.com/index/api-prompt-caching/
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching#what-is-cached

Asks for BinaryData from the options:

azure-sdk-for-java/sdk/openai/azure-ai-openai/src/main/java/com/azure/ai/openai/OpenAIClient.java

Line 726 in cc459ee

    
           return getChatCompletionsWithResponse(deploymentOrModelName, BinaryData.fromObject(chatCompletionsOptions),

Which simply uses a default serialization implementation to turn the CompletionChatOptions into BinaryData

azure-sdk-for-java/sdk/core/azure-core/src/main/java/com/azure/core/util/BinaryData.java

Lines 614 to 615 in cc459ee

    
           public static BinaryData fromObject(Object data) { 
        
               return fromObject(data, SERIALIZER);

azure-sdk-for-java/sdk/core/azure-core/src/main/java/com/azure/core/util/BinaryData.java

Line 181 in cc459ee

    
           static final JsonSerializer SERIALIZER = JsonSerializerProviders.createInstance(true);

Additional context

microsoft/semantic-kernel#9444
openai/openai-dotnet#281

The text was updated successfully, but these errors were encountered:

mssfang · 2024-11-06T21:39:21Z

Hi, @brandonh-msft
Currently, Java SDK is working on the service API version 2024-10-01-preview, Will keep you posted when it released.

Are you suggest ChatCompletionsOptions should always have tools goes ahead of messages and other properties?

brandonh-msft · 2024-11-07T15:46:33Z

well, I'm not, the feature does 😉

tools
conversation history
new content

should be the structure in order to maximize prompt caching, per the docs for the feature from AOAI and OAI.

github-actions bot added Client This issue points to a problem in the data-plane of the library. needs-team-triage Workflow: This issue needs the team to triage. OpenAI labels Nov 5, 2024

mssfang self-assigned this Nov 5, 2024

mssfang removed the needs-team-triage Workflow: This issue needs the team to triage. label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structure completion request to maximize Prompt Caching #42805

Structure completion request to maximize Prompt Caching #42805

brandonh-msft commented Nov 5, 2024 •

edited

Loading

mssfang commented Nov 6, 2024

brandonh-msft commented Nov 7, 2024 •

edited

Loading

Structure completion request to maximize Prompt Caching #42805

Structure completion request to maximize Prompt Caching #42805

Comments

brandonh-msft commented Nov 5, 2024 • edited Loading

Additional context

mssfang commented Nov 6, 2024

brandonh-msft commented Nov 7, 2024 • edited Loading

brandonh-msft commented Nov 5, 2024 •

edited

Loading

brandonh-msft commented Nov 7, 2024 •

edited

Loading