You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.
The above code shows the nearest neighbors of the word for "zucchini": gets "Masiello" (a family name), "soldatiello" (diminutive of "soldat"), "Mezzaniello" (type of pasta), "perettiello" (type of container for the wine), "maretiello" (diminutive of "husband"), etc.
Here it’s marginally better: 40% of the words are related to the sea, probably because "mare" is the same in Italian and all those words come from Italian.
Let’s try with a famous word, guaglione (young man, adolescent):
"gguaglione" (feminine plural), "uaglione" (variant) and "Guaglione" (with a capital letter) are various versions of "guaglione", but the other words have nothing to do with it.
Is there anything one can do to improve the accuracy of the model, or is it inherent to the small size of the corpus?
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello,
The model for Neapolitan (
nap
) is unusable because of its poor quality:>>> ft.get_nearest_neighbors("cuccuziello") [(0.9683643579483032, 'Masiello'), (0.9683618545532227, 'soldatiello'), (0.9682843685150146, 'Mezzaniello'), (0.9651128053665161, 'perettiello'), (0.963299572467804, 'maretiello'), (0.9630503058433533, 'nnammoratiello'), (0.9629217386245728, 'Fermariello'), (0.9614925384521484, 'poveriello'), (0.9613924622535706, 'Manniello'), (0.9589092135429382, 'ciancianiello')]
The above code shows the nearest neighbors of the word for "zucchini": gets "Masiello" (a family name), "soldatiello" (diminutive of "soldat"), "Mezzaniello" (type of pasta), "perettiello" (type of container for the wine), "maretiello" (diminutive of "husband"), etc.
Let’s try with
mare
(sea):>>> ft.get_nearest_neighbors("mare") [(0.6819297671318054, 'maree'), (0.6802213788032532, 'sommare'), (0.67812180519104, 'Altomare'), (0.6762729287147522, 'mmare'), (0.6754312515258789, 'sciummare'), (0.6556524038314819, 'Oltremare'), (0.6542813181877136, 'amare'), (0.6521005630493164, 'Croismare'), (0.6465907692909241, 'lungomare'), (0.6444516181945801, 'Zimmare')]
Here it’s marginally better: 40% of the words are related to the sea, probably because "mare" is the same in Italian and all those words come from Italian.
Let’s try with a famous word,
guaglione
(young man, adolescent):>>> ft.get_nearest_neighbors("guaglione") [(0.9444118738174438, 'gguaglione'), (0.9239395260810852, 'uaglione'), (0.922201931476593, 'Quaglione'), (0.9067193269729614, 'Guaglione'), (0.8721657991409302, 'Scaglione'), (0.8564983010292053, 'Baglione'), (0.8542811870574951, 'Faraglione'), (0.8541175127029419, 'muraglione'), (0.8494646549224854, 'Zampaglione'), (0.8474137783050537, 'Maglione')]
"gguaglione" (feminine plural), "uaglione" (variant) and "Guaglione" (with a capital letter) are various versions of "guaglione", but the other words have nothing to do with it.
Is there anything one can do to improve the accuracy of the model, or is it inherent to the small size of the corpus?
The text was updated successfully, but these errors were encountered: