使用Python从inspect元素中获取代码
作者:互联网
在Safari浏览器中,我可以右键单击并选择“Inspect Element”,然后会出现很多代码.是否可以使用Python获取此代码?最好的解决方案是获取包含代码的文件.
更具体地说,我试图找到这个页面上图像的链接:http://500px.com/popular.我可以看到“Inspect Element”中的链接,我想用Python检索它们.
解决方法:
获取网页源代码的一种方法是使用Beautiful Soup library.这个教程显示为here.页面中的代码如下所示,注释是我的.此特定代码不起作用,因为内容在其用作示例的网站上已更改,但该概念应该可以帮助您执行您想要执行的操作.希望能帮助到你.
from bs4 import BeautifulSoup
from urllib2 import urlopen
BASE_URL = "http://www.chicagoreader.com"
def get_category_links(section_url):
# Put the stuff you see when using Inspect Element in a variable called html.
html = urlopen(section_url).read()
# Parse the stuff.
soup = BeautifulSoup(html, "lxml")
# The next two lines will change depending on what you're looking for. This
# line is looking for <dl class="boccat">.
boccat = soup.find("dl", "boccat")
# This line organizes what is found in the above line into a list of
# hrefs (i.e. links).
category_links = [BASE_URL + dd.a["href"] for dd in boccat.findAll("dd")]
return category_links
编辑1:上面的解决方案提供了网络抓取的一般方法,但我同意对该问题的评论. API绝对是这个网站的方式.感谢yuvi提供它.该API于https://github.com/500px/PxMagic提供.
编辑2:有一个关于获取流行照片链接的问题的示例.来自example的Python代码粘贴在下面.您需要安装API库.
import fhp.api.five_hundred_px as f
import fhp.helpers.authentication as authentication
from pprint import pprint
key = authentication.get_consumer_key()
secret = authentication.get_consumer_secret()
client = f.FiveHundredPx(key, secret)
results = client.get_photos(feature='popular')
i = 0
PHOTOS_NEEDED = 2
for photo in results:
pprint(photo)
i += 1
if i == PHOTOS_NEEDED:
break
标签:python,web,mechanize,inspect-element 来源: https://codeday.me/bug/20190830/1769167.html