Description
Hi Nickolay, I tried to find a description of some criteria as to how the dictionary is transcribed (from other sources). One feature that strikes me as particularly odd is the entries marked with the 1
for primary stress on multiple vowels. Many of these are compounds--and, without understanding even the basic principles, I am not trying to get into the topic of compound treatment. But there are non-initialisms, non-compounds which show multiple "primary stress," (if I should take the digit markers at their face value). For example, the stressed final -ee more or less consistently yields IY1
(see entries for inductee, markee, pawnee), many of which have another primary stress elsewhere in addition to the final IY1
; according to AmHer, inductee, has a secondary stress on /in/-, and the other two, bisyllabic examples do not carry a secondary stress at all. So the example of inductee is off the mark, with its double primary stress:
CMUDict IH2 N D AH1 K T IY1
AmHer IH2 N D AH0 K T IY1 (assuming ə -> AH0)
On the other hand, the phonemic transcription of a sample of a few words in -ee carrying the stress 1 elsewhere (e.g. manatee M AE1 N AH0 T IY2
) does match that of AmHer. Looks like the common theme here is the final stressed IY1
.
Is this just an error, or is there is more to it? If that should be fixed, I have a list, not split into categories of initialisms, compounds and simple words, but I can manually select the latter category, it's not that large. Most of the rejects are compounds even in the weakest sense of the word (e.g. remake, where re is a morpheme that would not stand on its own).
The cmusphinx-devel list on SF has had almost no traffic for the last 2 years, so I do not think it makes more sense to bring the question there than here--or is it? Or has the list moved elsewhere?
I am using the dictionary in a research, and simply discarding data with multiple primary stress (this is how science is supposed to work, is not it? If the data does not fit the theory, too bad for the data :D ). But this is still confusing.