"Jieba"(Chinesefor"tostutter")Chinesetextsegmentation:builttobethebestPythonChinesewordsegmentationmodule.
(PoweredbyAppfog)
Codeexample:segmentation
Example(keywordextraction)
DeveloperscanspecifytheirowncustomIDFcorpusinjiebakeywordextraction
Developerscanspecifytheirowncustomstopwordscorpusinjiebakeywordextraction
Use:jieba.analyse.textrank(sentence,topK=20,withWeight=False,allowPOS=('ns','n','vn','v'))
NotethatitfiltersPOSbydefault.
jieba.analyse.TextRank()createsanewTextRankinstance.
importjiebajieba.initialize()#(optional)Youcanalsospecifythedictionary(notsupportedbeforeversion0.28):
Bydefault,anin-betweendictionaryisused,calleddict.txtandincludedinthedistribution.
Ineithercase,downloadthefileyouwant,andthencalljieba.set_dictionary('data/dict.txt.big')orjustreplacetheexistingdict.txt.