Skip to content

Add three-tier difficulty prompts for user intent evaluation #473

@irmadong

Description

@irmadong

Required Pre-requisites

Motivation

Motivation: Create a robust evaluation framework with three distinct difficulty levels of synthetic data generation to comprehensively test the user intent search pipeline's performance across various real-world scenarios.

Proposed Solution

🛠️ Proposed Solution

Implement a three-tier synthetic data generation system with progressive difficulty levels:

Core Implementation:

  1. Three Prompt Templates (prompt_easy, prompt_medium, prompt_hard)

    • Easy: Direct app mentions + clear goals
    • Medium: App mentions + business context, no explicit function terms
    • Hard: Implicit/contextual language, optional app mentions
  2. Graduated Complexity Testing

    • Baseline performance (easy) → Real-world scenarios (medium) → Edge cases (hard)
    • Systematic evaluation across user communication styles
  3. Enhanced Evaluation Pipeline

    • Custom dataset naming by difficulty level
    • Comparative performance analysis
    • Comprehensive metrics tracking per tier

Outcome:

A systematic evaluation framework that provides data-driven insights into search pipeline performance across realistic user complexity scenarios, enabling targeted improvements and benchmarking.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions