python-Cloud ML Engine和Scikit-Learn:“ LatentDirichletAllocation”对象没有属性“ predict”
作者:互联网
我正在实现简单的Scikit-Learn管道,以在Google Cloud ML Engine中执行LatentDirichletAllocation.目标是根据新数据预测主题.这是用于生成管道的代码:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.datasets import fetch_20newsgroups
dataset = fetch_20newsgroups(shuffle=True, random_state=1,
remove=('headers', 'footers', 'quotes'))
train, test = train_test_split(dataset.data[:2000])
pipeline = Pipeline([
('CountVectorizer', CountVectorizer(
max_df = 0.95,
min_df = 2,
stop_words = 'english')),
('LatentDirichletAllocation', LatentDirichletAllocation(
n_components = 10,
learning_method ='online'))
])
pipeline.fit(train)
现在(如果我正确理解的话)可以预测测试数据的主题,我可以运行:
pipeline.transform(test)
但是,当将管道上传到Google Cloud Storage并尝试使用它通过Google Cloud ML Engine生成本地预测时,出现错误,提示LatentDirichletAllocation没有属性预测.
gcloud ml-engine local predict \
--model-dir=$MODEL_DIR \
--json-instances $INPUT_FILE \
--framework SCIKIT_LEARN
...
"Exception during sklearn prediction: " + str(e)) cloud.ml.prediction.prediction_utils.PredictionError: Failed to run the provided model: Exception during sklearn prediction: 'LatentDirichletAllocation' object has no attribute 'predict' (Error code: 2)
缺少预测方法也可以从文档中看出来,所以我想这不是解决问题的办法.
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
现在的问题是:要走的路是什么?如何通过Google Cloud ML Engine在Scikit-Learn管道中使用LatentDirichletAllocation(或类似方法)?
解决方法:
当前,管道的最后一个估计器必须实现预测方法.
标签:google-cloud-ml,text-classification,scikit-learn,machine-learning,python 来源: https://codeday.me/bug/20191108/2009785.html