How to calculate the distance between composed feature and target image feature? #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

SunTongtongtong opened this issue Aug 4, 2023 · 1 comment

SunTongtongtong commented Aug 4, 2023

Hello there,

Thanks for publish the excellent work! I have following questions:

Based on the ReadMe usage2, we can achieve the fusion feature(based on reference image + modified text), how do you calculate the distance between the fusion feature and target image feature? Do you calculate it based on cosine similarity or euclidean distance？
Do we use the clip embedding for the target image?

Best

Collaborator

SanghyukChun commented Oct 26, 2023 •

edited

Loading

It is cosine similarity, i.e., Euclidean distance after L2 normalization. You can check the details in the code
Yes. We directly compute the similarity between edited feature (query image & query text) and the CLIP visual feature (target image)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment