爬取网页图片链接并下载保存
作者:互联网
先把需要爬取的网页链接写到表格中
读取表格信息
import xlrd
excel_path = '/Users/yt/Desktop/chaye.xlsx'
workbook: xlrd.book.Book = xlrd.open_workbook(excel_path)
sheet: xlrd.sheet.Sheet = workbook.sheet_by_index(0)
for row in range(1, sheet.nrows):
url = sheet.row_values(row)[0]
id_str = sheet.row_values(row)[1]
self.deal_photo_item(url, id_str)
开始爬取链接并下载图片
import urllib.request
def deal_photo_item(self, url, id_str):
self.driver.get(url)
self.driver.execute_script("window.scrollBy(0,1000)")
random_sleep()
self.driver.execute_script("window.scrollBy(0,-1000)")
random_sleep()
html = self.driver.page_source
selector = Selector(text=html)
images = []
banner_image_list = selector.css('#nc_small::attr(src)').extract()
for i, detail_image in enumerate(banner_image_list):
# 下载图片
urlStr = detail_image.replace('_60.jpg', '') + '.jpg'
filename = f'./图片/{id_str}+{i}.jpg'
urllib.request.urlretrieve(urlStr, filename=filename)
具体的爬取代码可以看我之前的淘宝商品信息爬取代码,这边只记录通过链接保存图片
标签:sheet,xlrd,self,id,爬取,网页,图片链接,image,row 来源: https://blog.csdn.net/yt_xy/article/details/112982805