Option to get logprobs #20

blixt · 2024-07-11T13:44:33Z

The current API is great for producing a text response, but if we could provide an option that gave us the logprobs for each streamed token, we'd be able to implement a lot more functionality on top of the model such as basic guidance, estimating confidence levels, collecting multiple branches of output more efficiently, custom token heuristics instead of the built-in temperature/topK (I saw there was another proposal to add a seed option, but this would let you build that yourself), and more.

Basically it could be modeled from something like the top_logprobs parameter that the OpenAI API has which would return something like this for top_logprobs=2:

{
  "logprobs": {
    "content": [
      {
        "token": "Hello",
        "logprob": -0.31725305,
        "top_logprobs": [
          {
            "token": "Hello",
            "logprob": -0.31725305
          },
          {
            "token": "Hi",
            "logprob": -1.3190403
          }
        ]
      },
      {
        "token": "!",
        "logprob": -0.02380986,
        "top_logprobs": [
          {
            "token": "!",
            "logprob": -0.02380986
          },
          {
            "token": " there",
            "logprob": -3.787621
          }
        ]
      },
      {
        "token": " How",
        "logprob": -0.000054669687,
        "top_logprobs": [
          {
            "token": " How",
            "logprob": -0.000054669687
          },
          {
            "token": "<|end|>",
            "logprob": -10.953937
          }
        ]
      },
// etc

The text was updated successfully, but these errors were encountered:

ryanseddon · 2024-07-15T22:33:09Z

I second this if we get some api like the above we can look at creating equivalent tools like guidanceai.

KenjiBaheux · 2024-08-20T08:49:25Z

Can someone explain what the "basic guidance" refers to?

I understand the ideas behind the other examples mentioned (confidence levels, collecting multiple branches of output more efficiently, custom token heuristics instead of the built-in temperature/topK) but not the basic guidance one.

I also wonder if/how exposing logprobs might further complicate the interoperability aspect.

blixt · 2024-08-20T08:53:55Z

Basic guidance would be an inefficient way to force valid JSON output etc similar to how https://github.com/guidance-ai/guidance does it for closed APIs like OpenAI. It's closely related to custom token control. (Inefficient because it requires round trips unlike a native guidance solution.)

ACMCMC · 2024-11-05T05:08:56Z

I was looking for this exact feature and couldn't find anything — an absolute must for me!

(For the sake of clarity, I'm not interested in that for guidance purposes as others have mentioned. I need to have access to the logprobs only).

In my case, it would also be helpful to get the logprobs of any given text, not just its completion tokens

E.g.: "The cat sat" -> [-2.4, -2.1, -0.3]
Instead of: "The cat sat" -> "on the..." ([-0.4, -0.2, ...])

domenic added the enhancement New feature or request label Jul 29, 2024

sushraja-msft mentioned this issue Aug 22, 2024

Support for guidance/structured output with prompt API #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to get logprobs #20

Option to get logprobs #20

blixt commented Jul 11, 2024

ryanseddon commented Jul 15, 2024

KenjiBaheux commented Aug 20, 2024

blixt commented Aug 20, 2024

ACMCMC commented Nov 5, 2024 •

edited

Loading

Option to get logprobs #20

Option to get logprobs #20

Comments

blixt commented Jul 11, 2024

ryanseddon commented Jul 15, 2024

KenjiBaheux commented Aug 20, 2024

blixt commented Aug 20, 2024

ACMCMC commented Nov 5, 2024 • edited Loading

ACMCMC commented Nov 5, 2024 •

edited

Loading