03_列出所有的實義詞 (Content Word)

實義詞 (Content Word) 又稱「內容詞」，是指語言中「動詞」、「名詞」、「形容詞」與「副詞」這類語意資訊含量較高的詞彙。一般可以把「取得實義詞」的步驟，視為執行中文 NLP 任務時，類似處理西方語系文字裡的「去除停用詞 (Stop Words Removal)」的過程。

輸入要分析的句子字串 "你計劃過地球人類補完計劃"

from ArticutAPI import Articut
from pprint import pprint
username = "" #這裡填入您在 https://api.droidtown.co 使用的帳號 email。若使用空字串，則預設使用每小時 2000 字的公用額度。
apikey   = "" #這裡填入您在 https://api.droidtown.co 登入後取得的 api Key。若使用空字串，則預設使用每小時 2000 字的公用額度。
articut = Articut(username, apikey)

inputSTR = "你計劃過地球人類補完計劃"
resultDICT = articut.parse(inputSTR)
pprint(resultDICT["result_pos"])

列出所有的 content word

contentWordLIST = articut.getContentWordLIST(resultDICT)
pprint(contentWordLIST)

輸出結果如下

[[(41, 43, '計劃'), (88, 90, '人類'), (111, 112, '補'), (138, 140, '計劃')]]

輸出結果的結構，對應輸入的每一個句子的結構。

[ ##### 最外層的 list 對應「輸入」  
    [ ##### 第二層的 list 對應「第幾句」。例如第一個 list 表示是輸入的「第一句」    
        (41, 43, '計劃'), ##### 第三層的 tuple 其結構為 (詞彙開始索引, 詞彙結束索引, 詞彙)   
        (88, 90, '人類'),   
        (111, 112, '補'),   
        (138, 140, '計劃')  
    ]  
]

Droidtown Linguistic Tech.
Document | Blog | Twitter @DroidtownLing | Facebook @Articut | Website

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

03_列出所有的實義詞 (Content Word)

輸入要分析的句子字串 "你計劃過地球人類補完計劃"

列出所有的 content word

輸出結果如下

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally