编程语言
首页 > 编程语言> > 从IPython笔记本运行MRJob

从IPython笔记本运行MRJob

作者:互联网

我试图从IPython笔记本运行mrjob示例

from mrjob.job import MRJob


class MRWordFrequencyCount(MRJob):

def mapper(self, _, line):
    yield "chars", len(line)
    yield "words", len(line.split())
    yield "lines", 1

def reducer(self, key, values):
    yield key, sum(values)  

然后用代码运行它

mr_job = MRWordFrequencyCount(args=["testfile.txt"])
with mr_job.make_runner() as runner:
    runner.run()
    for line in runner.stream_output():
        key, value = mr_job.parse_output_line(line)
        print key, value

并得到错误:

TypeError: <module '__main__' (built-in)> is a built-in class

有没有办法从IPython笔记本运行mrjob?

解决方法:

我怀疑它是由MRJob网站上的this limitation所述:

The file with the job class is sent to Hadoop to be run. Therefore,
the job file cannot attempt to start the Hadoop job, or you would be
recursively creating Hadoop jobs!The code that runs the job should
only run outside of the Hadoop context.

或者,可能是因为您没有以下(reference):

if __name__ == '__main__':  
  MRWordCounter.run()  # where MRWordCounter is your job class

标签:python,ipython-notebook,mapreduce,mrjob
来源: https://codeday.me/bug/20190703/1363956.html