Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to get logprobs #20

Open
blixt opened this issue Jul 11, 2024 · 4 comments
Open

Option to get logprobs #20

blixt opened this issue Jul 11, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@blixt
Copy link

blixt commented Jul 11, 2024

The current API is great for producing a text response, but if we could provide an option that gave us the logprobs for each streamed token, we'd be able to implement a lot more functionality on top of the model such as basic guidance, estimating confidence levels, collecting multiple branches of output more efficiently, custom token heuristics instead of the built-in temperature/topK (I saw there was another proposal to add a seed option, but this would let you build that yourself), and more.

Basically it could be modeled from something like the top_logprobs parameter that the OpenAI API has which would return something like this for top_logprobs=2:

{
  "logprobs": {
    "content": [
      {
        "token": "Hello",
        "logprob": -0.31725305,
        "top_logprobs": [
          {
            "token": "Hello",
            "logprob": -0.31725305
          },
          {
            "token": "Hi",
            "logprob": -1.3190403
          }
        ]
      },
      {
        "token": "!",
        "logprob": -0.02380986,
        "top_logprobs": [
          {
            "token": "!",
            "logprob": -0.02380986
          },
          {
            "token": " there",
            "logprob": -3.787621
          }
        ]
      },
      {
        "token": " How",
        "logprob": -0.000054669687,
        "top_logprobs": [
          {
            "token": " How",
            "logprob": -0.000054669687
          },
          {
            "token": "<|end|>",
            "logprob": -10.953937
          }
        ]
      },
// etc
@ryanseddon
Copy link

I second this if we get some api like the above we can look at creating equivalent tools like guidanceai.

@domenic domenic added the enhancement New feature or request label Jul 29, 2024
@KenjiBaheux
Copy link

Can someone explain what the "basic guidance" refers to?

I understand the ideas behind the other examples mentioned (confidence levels, collecting multiple branches of output more efficiently, custom token heuristics instead of the built-in temperature/topK) but not the basic guidance one.

I also wonder if/how exposing logprobs might further complicate the interoperability aspect.

@blixt
Copy link
Author

blixt commented Aug 20, 2024

Basic guidance would be an inefficient way to force valid JSON output etc similar to how https://github.com/guidance-ai/guidance does it for closed APIs like OpenAI. It's closely related to custom token control. (Inefficient because it requires round trips unlike a native guidance solution.)

@ACMCMC
Copy link

ACMCMC commented Nov 5, 2024

I was looking for this exact feature and couldn't find anything — an absolute must for me!

(For the sake of clarity, I'm not interested in that for guidance purposes as others have mentioned. I need to have access to the logprobs only).

In my case, it would also be helpful to get the logprobs of any given text, not just its completion tokens

E.g.: "The cat sat" -> [-2.4, -2.1, -0.3]
Instead of: "The cat sat" -> "on the..." ([-0.4, -0.2, ...])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants