编程语言
首页 > 编程语言> > Scrapy Python Craigslist Scraper

Scrapy Python Craigslist Scraper

作者:互联网

我正在尝试使用Scrapy刮取Craigslist分类来提取待售物品.

我能够提取日期,帖子标题和发布网址,但我无法提取价格.

由于某种原因,当前代码提取所有价格,但当我在价格跨度之前删除//时,价格字段返回为空.

有人可以查看下面的代码并帮助我吗?

from scrapy.spider import BaseSpider
    from scrapy.selector import HtmlXPathSelector
    from craigslist_sample.items import CraigslistSampleItem

    class MySpider(BaseSpider):
        name = "craig"
        allowed_domains = ["craigslist.org"]
        start_urls = ["http://longisland.craigslist.org/search/sss?sort=date&query=raptor%20660&srchType=T"]

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select("//p")
    items = []
    for titles in titles:
        item = CraigslistSampleItem()
        item['date'] = titles.select('span[@class="itemdate"]/text()').extract()
        item ["title"] = titles.select("a/text()").extract()
        item ["link"] = titles.select("a/@href").extract()
        item ['price'] = titles.select('//span[@class="itempp"]/text()').extract()
        items.append(item)
    return items

解决方法:

itempp似乎在另一个元素itempnr中.如果您要将// span [@ class =“itempp”] / text()更改为span [@ class =“itempnr”] / span [@ class =“itempp”] / text(),也许可行.

标签:python,scrapy,scraper,craigslist
来源: https://codeday.me/bug/20190703/1370871.html