嗯,如今最终用到了。朴素贝叶斯分类器据说是好多扫黄软件使用的算法。贝叶斯公式也比較简单,大学做概率题常常会用到。核心思想就是找出特征值对结果影响概率最大的项。
公式例如以下:
[['my','dog','has','flea','problems','help','please'],0
['maybe','not','take','him','to','dog','park','stupid'],1
['my','dalmation','is','so','cute','I','love','him'],0
['stop','posting','stupid','worthless','garbage'],1
['mr','licks','ate','my','steak','how','to','stop','him'],0
['quit','buying','worthless','dog','food','stupid']]1
我们通过分析每一个句子中的每一个词,在粗口句或是正常句出现的概率,能够找出那些词是粗口。
0.0.052631580.052631580.0.0.
0.052631580.052631580.052631580.052631580.052631580.
0.105263160.0.052631580.052631580.0.10526316
0.0.157894740.0.052631580.0.0.]
出现概率最大项:
0.157894736842
相应的词是:stupid
['cute','love','help','garbage','quit','I','problems','is','park','stop','flea','dalmation','licks','food','not','him','buying','posting','has','worthless','ate','to','maybe','please','dog','how','stupid','so','take','mr','steak','my']