首页 > 其他分享> > 爬取json Swaggerui界面

爬取json Swaggerui界面

2019-04-29 14:39:17 作者：互联网

对一个静态的网页进行爬取。

要获取的内容分别为 paths 标签下的

1./quota/开头的路径

2. get 这样的httpmode

3 description对应的描述

4 summary

5 tags 里存放着的服务名

6 服务名所对应的副描述（不在paths标签下）

7总的title（只有一个

import requests
import json
import pymysql
import urllib
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

url = 'http://192.168.101.213:7027/v2/api-docs'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36)"}
response = requests.get(url,headers=headers).text
json_str = json.loads(response)#转化为字符串
aa1 = json_str['info']['title']#大title
aa2=[]
aa2.append(aa1)
aa=aa2*42
cc1 = json_str['paths']#提取在整个paths标签下的数据
cc = [] #微服务路径
dd = [] #获取方式
for key in cc1.keys():#获得字典的key值
    cc.append(key)
for m in cc:
        dd1 = json_str['paths'][m]#在cc这个list中进行遍历
        for key2 in dd1.keys():
                        dd.append(key2)
ee = []#扩展操作
bb = []#微服务名
gg = []#微服务描述
ff = []#主描述
for o,p in zip(cc,dd):#同时遍历 cc和dd两个list 用zip对他们进行封装
                try:
                    ee1 = json_str['paths'][o][p]['summary']
                    bb1 = json_str['paths'][o][p]['tags'][0]
                    gg1 = json_str['paths'][o][p]['description']
                    bb.append(bb1)
                    ee.append(ee1)
                    gg.append(gg1)
                except(KeyError):#因为有两个要爬取的内容是没有 description这个key值的 所以遇到keyerror时继续爬取
                    continue
hh=[]#这一部爬取6个微服务名 因为相应description没有被一起存放在paths标签下面
for n1 in range (0,6):
                        hh1 = json_str['tags'][n1]['name']
                        hh.append(hh1)

list3 = [hh.index(num) for num in bb]#将存放在bb中的微服务名拿到hh中进行对比 获得他们的num值存放在list3中
for n3 in list3:
                ff1 = json_str['tags'][n3]['description']#遍历list3获得他们想对应的description值
                ff.append(ff1)
db = pymysql.connect('localhost', 'root', '******', 'languid')
cursor = db.cursor()
for z, x, c, v, z1, x1 ,c1 in zip(aa,bb,gg,cc,dd,ff,ee):#遍历6个列表并对他们封装
                    sql = """insert into swaggerui(Platform,Microservice,Microservicedescrption,MicroPaths,Httpmode,Microdescrption,MicroNotes)VALUES ('%s','%s','%s','%s','%s','%s','%s')"""%(z, x, c, v, z1, x1, c1)
                    cursor.execute(sql)
                    db.commit()

1).主要在于标签名字的获取，因为想要获取的内容被存放在了标签名。

for key in cc1.keys():
    　　　　　　　　cc.append(key)

所以要用for循环来获得字典的键值。

相当于

dic ={"name"="香蕉","种类"=“水果","sales”=“1000”}
for ke in dic.keys():
                        print(ke)

name
种类
sales


也可以用items（）的方法
for ke in dic.items():
                            print(ke[0])


name
种类
sales

2).用一个if语句对比两个list取得需要的值

list1=[a,b,c,d,e,a,c,e,a,b,d,c,a,e,b,c,e,a,d,e](顺序是随机的但是都是a-e）
list2=[a,b,c,d,e]

用索引的方法我们可以获得对应的num值

list3 = [list2.index(num) for num in list1]

[0, 1, 2, 3, 4, 0, 2, 4, 0, 1, 3, 2, 0, 4, 1, 2, 4, 0, 3, 4]

再用一个for循环遍历list3就可以获得相应的数据了。

最后存放到mysql里面的显示是这样的

标签：paths,Swaggerui,cc,爬取,json,str,import,append
来源： https://www.cnblogs.com/languid/p/10790207.html