Skip to content
Discussion options

You must be logged in to vote

Is your goal to remove semantic duplicates or exact ones? And do you anticipate needing to do this many times or 1 time? And what scale is your data set?

If exact, you don't need Faiss, you can just hash or exact match your data to dedupe. Especially if you just need to do it once, this is easiest.

If semantic duplicates, then Faiss could be useful, as you can turn your online listing data into embeddings, and find near neighbors. However there are also projects like https://github.com/facebookresearch/SemDeDup which may be useful for you.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@angelotc
Comment options

Answer selected by angelotc
Comment options

You must be logged in to vote
1 reply
@mnorris11
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants