Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structure completion request to maximize Prompt Caching #42805

Open
brandonh-msft opened this issue Nov 5, 2024 · 2 comments
Open

Structure completion request to maximize Prompt Caching #42805

brandonh-msft opened this issue Nov 5, 2024 · 2 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. OpenAI

Comments

@brandonh-msft
Copy link
Member

brandonh-msft commented Nov 5, 2024

Today, the current flow of a request through to an OpenAI service relies on simple JSON-serialization of a model to encode the message to BinaryData and send it through the pipeline.

This does not maximize Prompt Caching capabilities, where the completion request should have tools, then history, then new content - in that order.
Additionally, the tools and history must be in the same order every time (suggest alpha order by tool name).

Sources:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
https://openai.com/index/api-prompt-caching/
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching#what-is-cached

Asks for BinaryData from the options:

return getChatCompletionsWithResponse(deploymentOrModelName, BinaryData.fromObject(chatCompletionsOptions),

Which simply uses a default serialization implementation to turn the CompletionChatOptions into BinaryData

public static BinaryData fromObject(Object data) {
return fromObject(data, SERIALIZER);

static final JsonSerializer SERIALIZER = JsonSerializerProviders.createInstance(true);

Additional context

microsoft/semantic-kernel#9444
openai/openai-dotnet#281

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. needs-team-triage Workflow: This issue needs the team to triage. OpenAI labels Nov 5, 2024
@mssfang mssfang self-assigned this Nov 5, 2024
@mssfang
Copy link
Member

mssfang commented Nov 6, 2024

Hi, @brandonh-msft
Currently, Java SDK is working on the service API version 2024-10-01-preview, Will keep you posted when it released.

Are you suggest ChatCompletionsOptions should always have tools goes ahead of messages and other properties?

@mssfang mssfang removed the needs-team-triage Workflow: This issue needs the team to triage. label Nov 6, 2024
@brandonh-msft
Copy link
Member Author

brandonh-msft commented Nov 7, 2024

well, I'm not, the feature does 😉

  • tools
  • conversation history
  • new content

should be the structure in order to maximize prompt caching, per the docs for the feature from AOAI and OAI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. OpenAI
Projects
None yet
Development

No branches or pull requests

2 participants