Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds ranges for PPTSS sampling #112

Merged
merged 6 commits into from
Feb 17, 2025
Merged

Adds ranges for PPTSS sampling #112

merged 6 commits into from
Feb 17, 2025

Conversation

jzonthemtn
Copy link
Collaborator

Adds ranges for PPTSS sampling for #103.

Copy link
Collaborator

@wrigleyDan wrigleyDan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, the code works, all my testing was successful.

As I went through the changes and noticed the renaming of the classes I got confused (mainly because of the QueriesQueryQuery part in the class AllQueriesQueryQuerySamplerParameters - is that intentional or did one Query too many slip into the name?). That lead to the following thought: I think three sampling methods is what we ultimately want: a random sampler, a frequency weighted sampler (what PPTSS is), a "Top N" sampler picking just the most frequent N queries. This is also described in Stavros' document about the eyeballing tool.

How could that look like:

  • The AllQueriesQuerySampler becomes the RandomQuerySampler that picks N queries randomly out of all queries.
  • A new TopNQuerySampler is introduced that picks the N most frequent queries out of all queries.

When doing so I think renaming some of the classes might add clarity. For example:

  • AllQueriesQueryQuerySamplerParameters --> RandomQuerySamplerParameters
  • ProbabilityProportionalToSizeParametersQuery --> ProbabilityProportionalToSizeSamplerParameters

These changes should happen in a separate branch. I wanted to make this suggestion here to get the discussion started. What do you think?

@jzonthemtn
Copy link
Collaborator Author

AllQueriesQueryQuerySamplerParameters

I fixed the name of this class to take out one of the Query names and pushed it up.

I agree about the samplers and wrote:

I want to leave the AllQueriesQuerySampler sampler in primarily just for debugging. I know it's probably not very useful for search relevance engineering, but I have found it useful for testing.

So if it's ok with you, we can merge this PR and I'll add the other samplers in other pull requests.

Copy link
Collaborator

@wrigleyDan wrigleyDan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@jzonthemtn jzonthemtn merged commit 0a35fcc into main Feb 17, 2025
6 checks passed
@jzonthemtn jzonthemtn deleted the 103-pptss branch February 17, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants