首页 > 编程语言> > Python 爬虫批量下载美女图片，给枯燥的工作添加点乐趣！

Python 爬虫批量下载美女图片，给枯燥的工作添加点乐趣！

2019-09-05 11:55:05 作者：互联网

背景：

最近部门领导交给笔者一个爬取百度关键词排名的任务。写好了基本功能，能不能正常使用呢？于是乎，有了本文，爬取一些美女图片，一来可以检验下爬虫效果；二来呢，也可以养养眼，给工作增加点乐趣不是，哈哈。废话少说，这就是要抓取的图片了，很养眼吧。直接上代码地址：http://www.win4000.com/meitu.html

环境：（请读者自行配置）

Python3

urllib3

BeautifulSoup

requests

请读者自行查看审查元素，以确定抓取目标，完全生搬硬套，可能出问题

源代码：

download_meinv.py

import os

from urllib.parse import urlparse #应该是urllib3模块带来的，如果不是的话，之后在使用的过程在根据报错信息进行解决吧

from bs4 import BeautifulSoup

import requests

'''导入模块时先导入系统库,在导入第三方库'''

'''爬取美女网站首页的所有照片'''

r = requests.get('http://www.win4000.com/meitu.html')

soup = BeautifulSoup(r.text,'html.parser')

img_list = []

for img in soup.select('img'):

if img.has_attr('alt'):

if img.has_attr('data-original'):

img_list.append((img.attrs['alt'],img.attrs['data-original']))

else:

img_list.append((img.attrs['alt'],img.attrs['src']))

image_dir = os.path.join(os.curdir,'meinv')

if not os.path.isdir(image_dir):

os.mkdir(image_dir)

for img in img_list:

name = img[0] + '.' + 'jpg'

o = urlparse(img[1])

filepath = os.path.join(image_dir,name)

url = '%s://%s/%s' % (o.scheme,o.netloc,o.path[1:].replace('_250_350','')) #下载原图

print(url)

resp = requests.get(url)

with open(filepath,'wb') as f:

for chunk in resp.iter_content(1024): #如果图片太大，以1024字节为单位下载

f.write(chunk)

标签：img,Python,枯燥,list,爬虫,attrs,path,os,dir
来源： https://blog.51cto.com/20131104/2435732