Hi, thanks for sharing the great work!
In the paper you mentioned that you reran the K-disk clustering with a larger number of trajectories and built your own agent token vocabulary, which significantly improves the performance on the WOSAC leaderboard.
I have a question regarding the ablation:
Did you run experiments to isolate the contribution of the token vocabulary improvement alone (without CAT-K)? In Table 1, it seems the reported results already include both the improved vocabulary and CAT-K. Could you clarify how much of the gain comes purely from the new token vocabulary?
