-
|
Hi, I have online listings from multiple source sites, and I want to dedupe them. Is FAISS a good use for this? My listing metadata is in Supabase (PostgreSQL) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
Is your goal to remove semantic duplicates or exact ones? And do you anticipate needing to do this many times or 1 time? And what scale is your data set? If exact, you don't need Faiss, you can just hash or exact match your data to dedupe. Especially if you just need to do it once, this is easiest. If semantic duplicates, then Faiss could be useful, as you can turn your online listing data into embeddings, and find near neighbors. However there are also projects like https://github.com/facebookresearch/SemDeDup which may be useful for you. |
Beta Was this translation helpful? Give feedback.
-
|
Btw, do you recommend any feature engineering tool @mnorris11 ? I ended up creating my own 9 feature vector based off of domain knowledge. Just curious if there's something out there that creates these better. |
Beta Was this translation helpful? Give feedback.
Is your goal to remove semantic duplicates or exact ones? And do you anticipate needing to do this many times or 1 time? And what scale is your data set?
If exact, you don't need Faiss, you can just hash or exact match your data to dedupe. Especially if you just need to do it once, this is easiest.
If semantic duplicates, then Faiss could be useful, as you can turn your online listing data into embeddings, and find near neighbors. However there are also projects like https://github.com/facebookresearch/SemDeDup which may be useful for you.