You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given a list of sentences and words, and assuming that I want to deduplicate them, what is the best way to automate the elimination of duplicate items (similar wordings of the same item)?
The text was updated successfully, but these errors were encountered:
import spacy_universal_sentence_encoder
nlp = spacy_universal_sentence_encoder.load_model('en_use_lg')
with open('file.txt') as f:
lines = f.readlines()
lines2 = [nlp(i).vector for i in lines]
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.cluster import AgglomerativeClustering
k=256
cluster = AgglomerativeClustering(n_clusters=k, affinity='euclidean', linkage='ward')
a = cluster.fit_predict(lines2)
for i in range(k):
print(*[lines[j] for j in [j for j, x in enumerate(a) if x == i]])
print()
with open("myfile.txt", "w") as file1:
for i in range(k):
file1.writelines([lines[j] for j in [j for j, x in enumerate(a) if x == i]])
file1.write("\n")
Given a list of sentences and words, and assuming that I want to deduplicate them, what is the best way to automate the elimination of duplicate items (similar wordings of the same item)?
The text was updated successfully, but these errors were encountered: