Skip to content

Update prompt for generating/rewriting follow up questions #809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

colelandolt
Copy link

@colelandolt colelandolt commented Mar 13, 2025

Description

As described in #767, when the user asks a follow up question, sometimes Vanna AI combines that question with the previous question even when the second question is fully independent and can be answered on its own. This PR improves the prompt for follow up questions in the following ways:

  • Refers to the questions as "previous/new question" instead of "first/second question"
  • Emphasizes the decision-making process for determining if a rewrite/combo is necessary in order to mitigate the the current behavior that is too eager to combine the questions
  • Adds an instruction to focus on the intent of the new question
  • Adds an instruction to pay attention to specific prams mentioned in both questions

Risks

My linter ran on save, feel free to lint to your standards

Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Pull Request Review from LlamaPReview

1. Overview

1.1 Core Changes

  • Primary purpose and scope: Improve question handling logic to prevent over-eager combination of independent follow-up questions
  • Key components modified: generate_rewritten_question method prompt structure
  • Cross-component impacts: Affects core question processing pipeline and LLM interaction patterns
  • Business value alignment: Enhances conversational SQL reliability per issue Allowing Follow-up Questions Without Unnecessary Merging #767

1.2 Technical Architecture

  • System design modifications: Enhanced prompt engineering for context analysis
  • Component interaction changes: Modified LLM instruction set while maintaining output channels
  • Integration points impact: Localized changes to question rewriting component
  • Dependency changes: No new dependencies introduced

2. Critical Findings

2.1 Must Fix (P0🔴)

Issue: Terminology mismatch in user message construction

  • Impact: Causes LLM confusion between "previous/new" vs "first/second" terminology, undermining core PR purpose
  • Resolution: Update user message labels to match system prompt terminology

Issue: Missing response validation

  • Impact: Risk of returning unprocessed LLM output containing explanations
  • Resolution: Add response sanitization logic

2.2 Should Fix (P1🟡)

Issue: Lack of test coverage

  • Impact: Unable to verify behavior for key use cases
  • Suggested Solution: Add unit tests for independent/related question scenarios

Issue: No error handling for empty responses

  • Impact: Potential system failures on malformed LLM outputs
  • Suggested Solution: Implement fallback to original question

2.3 Consider (P2🟢)

Area: Prompt maintenance

  • Improvement Opportunity: Externalize prompts to template files for better maintainability

Area: Parameter handling

  • Improvement Opportunity: Explicit instruction for parameter preservation in combined questions

2.4 Summary of Action Items

  1. Immediate (P0): Fix terminology mismatch and add response validation
  2. High Priority (P1): Implement test suite and error handling
  3. Future (P2): Externalize prompts and enhance monitoring

3. Technical Analysis

3.1 Code Logic Analysis

📁 src/vanna/base/base.py - generate_rewritten_question

  • Submitted PR Code:
    self.user_message(
        "First question: " + last_question + "
Second question: " + new_question
    ),
  • Analysis:
    • Current logic uses inconsistent terminology vs system prompt
    • Creates cognitive dissonance for LLM between "first/second" and "previous/new"
    • Directly impacts core functionality of question independence detection
  • LlamaPReview Suggested Improvements:
    self.user_message(
        "Previous question: " + last_question + "
New question: " + new_question
    ),
  • Improvement rationale:
    • Maintains terminology consistency with system prompt
    • Critical for proper LLM instruction following
    • High-risk impact on feature effectiveness

📁 src/vanna/base/base.py - generate_rewritten_question

  • Submitted PR Code:
    return self.submit_prompt(prompt=prompt, **kwargs)
  • Analysis:
    • Returns raw LLM response without validation
    • Risk of including explanations or multiple responses
    • No handling of empty/malformed responses
  • LlamaPReview Suggested Improvements:
    response = self.submit_prompt(prompt=prompt, **kwargs)
    return response.strip().split("
")[0]  # Take first line only
  • Improvement rationale:
    • Ensures clean single-question output
    • Mitigates risk of unexpected response formats
    • Maintains API contract stability

3.2 Key Quality Aspects

  • Testing strategy: Critical need for test cases validating:
    • Independent question preservation
    • Contextual question merging
    • Parameter handling scenarios
  • Documentation needs: Prompt engineering decisions should be documented for future maintainers

4. Overall Evaluation

  • Technical assessment: Valuable improvements needing critical terminology fixes
  • Business impact: High-value fix for user-reported issue when P0 resolved
  • Risk evaluation: Medium risk without response validation
  • Notable positive aspects:
    • Clear prompt structure improvements
    • Enhanced context analysis guidelines
  • Implementation quality: Generally well-scoped with needed documentation
  • Final recommendation: Request Changes (due to P0 terminology mismatch)

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

+ last_question
+ "\nNew question: "
+ new_question
),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i also meet this issue, but this prompt not works.

this is my case


# 提供训练数据
# 1. 表结构信息
vn.train(ddl="""
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    customer_name TEXT,
    country TEXT,
    segment TEXT
);
         
CREATE TABLE products (
    product_id INTEGER PRIMARY KEY,
    product_name TEXT NOT NULL,
    category TEXT,
    price REAL,
    supplier TEXT
);

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    order_date TEXT,
    product_id INTEGER,
    amount REAL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);
""")

# 2. 业务术语文档
vn.train(documentation="""
我们的客户细分市场(segment)包括'企业'、'消费者'。
销售金额(amount)以人民币计算,是指的商品销售的金额,而不是商品销售的数量。
产品ID(product_id)对应的产品为:101=笔记本电脑, 102=打印机, 103=办公桌, 104=办公椅。
""")

print("训练完成,所有数据准备就绪!")

# 启动Flask应用
if __name__ == "__main__":
    print("启动Vanna Web应用界面...")
    print("试试问这些问题:")
    print("1. 哪个国家的销售额最高?")
    print("2. 各客户类型的销售情况如何?")
    print("3. 笔记本电脑的总销售额是多少?")
    app = VannaFlaskApp(vn)
    app.run()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this prompt works:

        prompt = [
            self.system_message("Your goal is to process follow-up questions from users. Analyze the relationship between the first and second questions, and follow these guidelines:\n"
                              "1. If the second question clearly contains pronouns (e.g., 'it', 'they', 'these') that refer to entities in the first question, replace these pronouns with the specific entities\n"
                              "2. If the second question omits key information but is clearly continuing the context of the first question, supplement that information\n"
                              "3. If the second question appears to be a completely new topic with no clear dependency on the first question, keep it unchanged and return it exactly as provided\n"
                              "Be conservative in your rewriting approach - prefer minimal changes over excessive merging. Only combine questions when the second question has explicit references to the first.\n"
                              "The final question should be answerable with a single SQL statement and should be understandable without any additional context."),
            self.user_message("First question: " + last_question + "\nSecond question: " + new_question),
        ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants