Python爬虫-文件存储txt、json、csv(五)
作者:互联网
一、TXT文件存储
将数据保存到 TX 文本的操作非常简单, 而且 TXT 文本几乎兼容任何平台,但是这有个缺点,那就是不利于检索 所以如果对检索和数据结构要求不高,追求方便第一的话,可以采用 TXT 文本存储 本节中,我们就来看下如何利用 Python 保存 TXT 文本文件 代码示例:import csv import requests from pyquery import PyQuery as pq url='https://www.zhihu.com/explore' headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'} html=requests.get(url,headers=headers).text #pyquery写法01 doc=pq(html) items=doc('.ExploreCollectionCard-contentItem').items() def save_json(): for item in items: url = item.find('.ExploreCollectionCard-contentTitle').attr('href') # print(url) contentExcerpt = item.find('.ExploreCollectionCard-contentExcerpt').text() # print(contentExcerpt) span_txt = item.find('.ExploreCollectionCard-contentTags').find('span').filter( '.ExploreCollectionCard-contentCountTag').text() data =[url,contentExcerpt,span_txt] with open('data.csv','a',encoding='utf-8',newline='') as file: writer=csv.writer(file) writer.writerow(data) if __name__ == '__main__': save_json()
二、json文件存储
JSON ,全称为 JavaScript ect Notation 也就 JavaScript 象标记 它通过对象和数组的组合来表示数据,构造简洁但是结构化程度非常高,是一种轻量级的数据交换格式 本节中,我们就来了解如何利用 ython 保存数据到 JSON 文件 代码示例:import requests from pyquery import PyQuery as pq import json url='https://www.zhihu.com/explore' headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'} html=requests.get(url,headers=headers).text #pyquery写法01 doc=pq(html) items=doc('.ExploreCollectionCard-contentItem').items() objs = [] def save_json(): with open('data.json','a',encoding='utf-8') as file: for item in items: url = item.find('.ExploreCollectionCard-contentTitle').attr('href') # print(url) contentExcerpt = item.find('.ExploreCollectionCard-contentExcerpt').text() # print(contentExcerpt) span_txt = item.find('.ExploreCollectionCard-contentTags').find('span').filter( '.ExploreCollectionCard-contentCountTag').text() # print(span_txt) data = { "url": url, "contentExcerpt": contentExcerpt, "span_txt": span_txt } # print(data) # 将提取的内容写入json格式的文件 # file.write(json.dumps(data,ensure_ascii=False)+'\n') objs.append(data) print(objs) file.write(json.dumps(objs,ensure_ascii=False,indent=2)) if __name__ == '__main__': save_json()
三、json文件存储
csv ,全称为 Comma-Separa ed Values ,中文可以叫作逗号分隔值或字符分隔值,其文件以纯文本形式存储表格数据代码示例:
import csv import requests from pyquery import PyQuery as pq url='https://www.zhihu.com/explore' headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36'} html=requests.get(url,headers=headers).text #pyquery写法01 doc=pq(html) items=doc('.ExploreCollectionCard-contentItem').items() def save_json(): for item in items: url = item.find('.ExploreCollectionCard-contentTitle').attr('href') # print(url) contentExcerpt = item.find('.ExploreCollectionCard-contentExcerpt').text() # print(contentExcerpt) span_txt = item.find('.ExploreCollectionCard-contentTags').find('span').filter( '.ExploreCollectionCard-contentCountTag').text() data =[url,contentExcerpt,span_txt] with open('data.csv','a',encoding='utf-8',newline='') as file: writer=csv.writer(file) writer.writerow(data) if __name__ == '__main__': save_json()
标签:__,contentExcerpt,ExploreCollectionCard,item,Python,url,json,csv 来源: https://www.cnblogs.com/xfbk/p/16653249.html