CHI R&R
This is a submission release for CHI R&R.
To Reviewers Note:
We thank the reviewers for their constructive and insightful feedback, which has significantly strengthened our paper. Below, we summarize the key changes:
# Issue 1: Statistical methods for NASA-TLX and Time
All reviewers noted flaws in the statistical methods. R3 requested comparisons for short and long two-phase interfaces.
> Changes: Replaced Mann-Whitney U tests with Bayesian models to improve transparency and include interaction effects. Bayesian models also provide meaningful results beyond binary thresholds for small samples. Pairwise time comparisons for short and long two-phase interfaces were added. Sections 5 and 7 reflect these changes.
# Issue 2: Claims regarding satisficing and interface benefit
All reviewers noted limited evidence for satisficing/benefits of the interface. R3 noted contradictions about interface benefits and cognitive load.
> Changes:
1. Section 6 adds three edit-distance analyses, each including Bayesian models and relevant qualitative support to show how two-phase interface aids participants.
2. Satisficing is reframed as a potential behavior: Section 5.4 contained consolidated qualitative evidence, and Section 8.1.2 rewritten for tighter discussion.
> Also addresses: R3’s concerns on how organizing options helps participants construct preliminary preferences.
> Rational: These changes strengthen benefits of 2-phase interface as evident by reduced edit distance and show how organization phase influence behaviors, beyond time and qualitative quotes.
# Issue 3: Addressing study limitation on voting time
R1, R2 noted voting time didn’t account for embedded decision time in text interfaces.
> Changes: We removed this analysis and focused on total time in Section 7. Discussions of bimodal voting were reduced to one sentence (Page 23, Ln.1148), and “efficient voting” references were removed for clarity.
# Issue 4: Qualitative Data Reporting
Reviewers suggested improving the reporting of qualitative data. R1 recommended comparisons between groups and integration with quantitative data. R2 noted scattered content, and R3 suggested omitting less relevant findings to reduce length.
> Changes:
1. We adopted R1’s recommendation shortening the qualitative section by grouping findings and contrasting them across experiment conditions for clarity. (Section 5.2 & 5.3, R1, R3)
2. We integrated qualitative findings with related quantitative results (e.g., temporal demand with time analysis) highlighting differences across experiment conditions for coherence. (R1, R2)
> Also addresses: These changes reduced the length of the original content.
# Issue 5: Methods Justification and Experiment Restructuring
R2 suggested restructuring the Experiment Design section, while all reviewers requested more justification and clarification of methods.
> Changes:
1. We did not alter the content of Section 4. Changes to Section 4 reflects implementing structural adjustments recommended by R2.
2. Section 3.2 clarified baseline interface choice (R1, R3)
3. Added subgroup age information (Section 4.1 Ln.465, Appendix D)
4. Added experiment time (Section 4.3 Ln.547)
5. Clarified interview procedure (Section 4.3.3 Ln.564)
6. Clarified rationale for selection of NASA-TLX and not alternatives (Section 4.3.3 Ln.569, R2).
# Issue 6: Decision of different-length survey
R1 noted: “the authors failed to control for the differences introduced by the voting option content when including the long and short lengths in the comparison.”
> Response: We clarify Section 4.2 Ln.527 that all participants received a different survey, regardless of length, as options were always sampled from a common pool. This randomization was designed to control for potential differences that R1 was concerned about.
# Issue 7: Length
R1, R3, R4 found the paper too lengthy.
> Changes: We addressed this by removing the bi-modal results subsection and reducing the voting time analysis. Section 2.2 was reduced, and the qualitative analysis section was tightened. The discussion of future work was shortened, and various paragraphs across the paper were fine-tuned for brevity. Despite adding descriptions for three distance-based Bayesian models, the word count was reduced from 10,006 to 9,819.
# Issue 8: Additional literature
> Suggestion: R3 recommended expanding theories motivating the two-phase interface.
> Response: Considering the paper's length, we added limited content. However, We clarified how theories informed pretests (Section 3, Ln.292, Ln.310) and strengthened connections in Section 8.2 (Ln.1261). We clarify that major design decisions were driven from theory and pretests.
# Miscellaneous Changes
> Adjusted image and adopted percentage for better comparison (R1, R2).
> Strengthened connections between QS challenges and interface design in the introduction and Section 2.3.
We believe these revisions address reviewers’ concerns and enhance the paper’s quality. Thank you for the opportunity to revise and improve our work.