其他分享
首页 > 其他分享> > 爬取网页图片链接并下载保存

爬取网页图片链接并下载保存

作者:互联网

先把需要爬取的网页链接写到表格中
在这里插入图片描述
读取表格信息

import xlrd
excel_path = '/Users/yt/Desktop/chaye.xlsx'

workbook: xlrd.book.Book = xlrd.open_workbook(excel_path)
sheet: xlrd.sheet.Sheet = workbook.sheet_by_index(0)

for row in range(1, sheet.nrows):
    url = sheet.row_values(row)[0]
    id_str = sheet.row_values(row)[1]
    self.deal_photo_item(url, id_str)

开始爬取链接并下载图片

import urllib.request
def deal_photo_item(self, url, id_str):
    self.driver.get(url)
    self.driver.execute_script("window.scrollBy(0,1000)")
    random_sleep()
    self.driver.execute_script("window.scrollBy(0,-1000)")
    random_sleep()

    html = self.driver.page_source
    selector = Selector(text=html)

    images = []
    banner_image_list = selector.css('#nc_small::attr(src)').extract()
    for i, detail_image in enumerate(banner_image_list):
    	# 下载图片
        urlStr = detail_image.replace('_60.jpg', '') + '.jpg'
        filename = f'./图片/{id_str}+{i}.jpg'
        urllib.request.urlretrieve(urlStr, filename=filename)

在这里插入图片描述
具体的爬取代码可以看我之前的淘宝商品信息爬取代码,这边只记录通过链接保存图片

标签:sheet,xlrd,self,id,爬取,网页,图片链接,image,row
来源: https://blog.csdn.net/yt_xy/article/details/112982805