Skip to content

How to calculate the distance between composed feature and target image feature? #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SunTongtongtong opened this issue Aug 4, 2023 · 1 comment

Comments

@SunTongtongtong
Copy link

Hello there,

Thanks for publish the excellent work! I have following questions:

  1. Based on the ReadMe usage2, we can achieve the fusion feature(based on reference image + modified text), how do you calculate the distance between the fusion feature and target image feature? Do you calculate it based on cosine similarity or euclidean distance?
  2. Do we use the clip embedding for the target image?

Best

@SanghyukChun
Copy link
Collaborator

SanghyukChun commented Oct 26, 2023

  1. It is cosine similarity, i.e., Euclidean distance after L2 normalization. You can check the details in the code
  2. Yes. We directly compute the similarity between edited feature (query image & query text) and the CLIP visual feature (target image)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants