Clarification on Training Data and Prompt Optimization in AdaCLIP #35

arifur-rahman-ar · 2025-03-12T06:26:07Z

Thank you for sharing your work on AdaCLIP. I have gone through the paper and the repository, and I have a few points that need clarification:

Training Data Details: The paper initially describes using auxiliary data for training but later states that "the description of the utilized training set is not accurate." Could you clarify whether VisA & ClinicDB are exclusively used for evaluation, or do they play any role in model fine-tuning?
Prompt Optimization: The concept of hybrid prompts (static and dynamic) is intriguing. However, are the dynamic prompts generated per image based on predefined embeddings, or do they adapt iteratively during testing? Additionally, do the prompts maintain consistency across similar anomalies in different domains?
Performance Stability: Given that FP16 training can be unstable, have you observed significant variance in performance across multiple runs? Would incorporating additional stability mechanisms (e.g., gradient accumulation or mixed-precision adjustments) improve robustness?

I appreciate your time and look forward to your insights. Thanks again for this excellent work!

Provide feedback