《机器学习实战》4.5使用Python进行文本分类 代码修正
作者:互联网
#原代码4-2中条件概率分母有误, 如P(cute=1|ci=0)应为1/3.
def trainNB0(trainMatrix, trainCategory): numTrainDocs = len(trainMatrix) numWords = len(trainMatrix[0]) pAbusive = sum(trainCategory)/float(numTrainDocs) p0Num = ones(numWords) p1Num = ones(numWords) p0Denom = 2.0 p1Denom = 2.0 for i in range(numTrainDocs): if trainCategory[i] == 1: p1Num += trainMatrix[i] p1Denom += 1 #条件概率分母修正 else: p0Num += trainMatrix[i] p0Denom += 1 #条件概率分母修正 p1Vect = p1Num/p1Denom #求log放在后面了 p0Vect = p0Num/p0Denom #求log放在后面了 return p0Vect, p1Vect, pAbusive
#原代码4-3中计算p1和p0时只考虑了所有P(wi=1|ci)分量,而忽略了P(wi=0|ci)分量, 而P(wi=0|ci)=1-P(wi=1|ci).
def classifyNB(vec2Classify, p0Vect, p1Vect, pClass1): oneVect = ones(len(p0Vect)) #制造一个等维度的1向量 p1VectInv = oneVect - p1Vect #制造P(w=0|ci)向量 p0VectInv = oneVect - p0Vect #制造P(w=0|ci)向量 log(p1Vect); log(p0Vect); log(p1VectInv); log(p0VectInv) #取对数 vec2ClassifyInv = oneVect - vec2Classify #制造用于取出各个P(w=0|ci)的向量 p1 = sum(vec2Classify*p1Vect) + sum(vec2ClassifyInv*p1VectInv) + log(pClass1) p0 = sum(vec2Classify*p0Vect) + sum(vec2ClassifyInv*p0VectInv) + log(1.0-pClass1) if p1 > p0: return 1 else: return 0
标签:4.5,ci,log,Python,sum,p0Vect,p1Vect,文本,trainMatrix 来源: https://www.cnblogs.com/dhfly/p/13062513.html