编程语言
首页 > 编程语言> > python-无法使用BeautifulSoup检索所需XPATH的元素

python-无法使用BeautifulSoup检索所需XPATH的元素

作者:互联网

我刚开始使用Web抓取功能,并且正在使用BeautifulSoup(Python)进行这项工作.我想获取示例网页的一些属性数据进行测试.代码开始如下,

import requests
from bs4 import BeautifulSoup as Soup

page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text)

# now, I would like to get the price for sale price of the apartment 
# the element in the HTML DOM is as following, 
# <span class="" id="yui_3_18_1_1_1464168312477_3548">$12,895,000<span class="value-suffix"></span></span>
# The XPath of the element, //*[@id="yui_3_18_1_1_1464168312477_3548"]

# I write the code as following,
value = soup.select('span#yui_3_18_1_1_1464168312477_3548')
print value 

我没有任何结果.我做错了什么?

解决方法:

您正在控制台中查看与从请求获取的源不同的源,它会动态生成span id =“ yui_3_18_1_1_1464170172533_3087”,因此您将需要使用selenium之类的东西.

不幸的是,每次访问时id也是唯一的,所以我们不能使用它,因为父div是一致的,所以我们可以使用css选择器通过main-row home-summary-row类获得父内部的第一个跨度:

In [4]: from selenium import webdriver
In [5]: dr = webdriver.PhantomJS()

In [6]: dr.get("http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/")
In [7]: span = dr.find_element_by_css_selector('div.main-row.home-summary-row span')
In [8]: print(span.text)
$12,895,000

我使用phantomjs进行无头浏览,如果愿意,可以使用Firefox或Chrome,所有信息都在链接中.

实际上,再次查看源代码,我们可以使用bs4做同样的事情,ID是唯一动态生成的东西,因此,如果我们忘记了ID,我们可以获得价格:

In [26]: soup.select_one("div.main-row.home-summary-row span").text
Out[26]: u'$12,895,000'

更好的方法是使用meta标签获取大量信息:

import requests
from bs4 import BeautifulSoup as Soup

page = "http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/"
response = requests.get(page)
soup = Soup(response.text,"lxml")
metas = soup.select("meta")

现在,如果我们看一下元数据返回的内容:

from pprint import pprint as pp

pp(metas)

[<meta content="on" http-equiv="x-dns-prefetch-control"/>,
 <meta charset="unicode-escape"/>,
 <meta content="View 31 photos of this $12,895,000, 7 bed, 10.0 bath, 10500 sqft single family home located at 1630 Amalfi Dr, Pacific Palisades, CA 90272 built in 2015. MLS # 16-103696." name="description"/>,
 <meta content="Zillow, Inc." name="author"/>,
 <meta content="Copyright (c) 2006-2014 Zillow, Inc." name="Copyright"/>,
 <meta content="none" name="msapplication-config"/>,
 <meta content="ALL" name="ROBOTS"/>,
 <meta content="NOYDIR" name="ROBOTS"/>,
 <meta content="NOODP" name="ROBOTS"/>,
 <meta content="yes" name="apple-mobile-web-app-capable"/>,
 <meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>,
 <meta content="telephone=no" name="format-detection"/>,
 <meta content="#3366b8" name="msapplication-TileColor"/>,
 <meta content="http://www.zillowstatic.com/static/images/logos/zillow-logo-win8-tile.png" name="msapplication-TileImage"/>,
 <meta content="/8Me6HBNZX/rt2n5/y1Lo3ZIrkcvkTBimqviTDiurR4=" name="verify-v1"/>,
 <meta content="7cb4abe457d82ae8" name="y_key"/>,
 <meta content="width=device-width, height=device-height, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, user-scalable=no" name="viewport"/>,
 <meta content="Zillow Real Estate, Rentals, and Mortgage" itemprop="name"/>,
 <meta content="The most trafficked website about home sales and rentals, with real estate values for almost every U.S. home. 1,000,000 listings that you won't find on MLS." itemprop="description"/>,
 <meta content="http://www.zillowstatic.com/static/images/social/share_thumbnail.png" itemprop="image"/>,
 <meta content="691f1bfccade71b5-c065751219a379dd-g64cedb67f5ea020a-a" name="google-translate-customization"/>,
 <meta content="202692,878610170,662000799,100001769907023,10716009,769244502,10716649,503322863" property="fb:admins"/>,
 <meta content="172285552816089" property="fb:app_id"/>,
 <meta content="zillow_fb:home" property="og:type"/>,
 <meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" property="og:zillow_fb:address"/>,
 <meta content="7" property="zillow_fb:beds"/>,
 <meta content="10" property="zillow_fb:baths"/>,
 <meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' property="zillow_fb:description"/>,
 <meta content="http://www.zillow.com/homedetails/1630-Amalfi-Dr-Pacific-Palisades-CA-90272/20546602_zpid/" property="og:url"/>,
 <meta content="Pacific Palisades Home For Sale" property="og:title"/>,
 <meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" property="og:image"/>,
 <meta content='For sale: $12,895,000. Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130" screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.' property="og:description"/>,
 <meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video"/>,
 <meta content="https://videos.zillowstatic.com/production/07a58eebcafbfe833b92f17945131f2e251b5fe5/mp4_600k_landscape_z1/mp4_600k_landscape_z1.mp4" property="og:video:secure_url"/>,
 <meta content="640" property="og:video:width"/>,
 <meta content="video/mp4" property="og:video:type"/>,
 <meta content="360" property="og:video:height"/>,
 <meta content="238648973530.apps.googleusercontent.com" name="google-signin-clientid"/>,
 <meta content="https://www.googleapis.com/auth/plus.login https://www.googleapis.com/auth/plus.profile.emails.read" name="google-signin-scope"/>,
 <meta content="http://zillow.com" name="google-signin-cookiepolicy"/>,
 <meta content="summary_large_image" name="twitter:card"/>,
 <meta content="@Zillow" name="twitter:site"/>,
 <meta content="@Zillow" name="twitter:creator"/>,
 <meta content="1630 Amalfi Dr" name="twitter:title"/>,
 <meta content="Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp;amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp;amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp;amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp;amp; master suite add warmth to the contemporary feel, &amp;amp; detailed wood paneling &amp;amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp;amp; private patio. Lower level feats. Old Hollywood style theater w/130&amp;quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp;amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp;amp; saltwater pool/spa complete this elegant estate." name="twitter:description"/>,
 <meta content="http://photos2.zillowstatic.com/p_d/IS5ypcj39edbdc1000000000.jpg" name="twitter:image"/>,
 <meta content="1630 Amalfi Dr, Pacific Palisades, CA 90272" itemprop="name"/>,
 <meta content="USD" itemprop="priceCurrency"/>,
 <meta content="$12,895,000" itemprop="price"/>,
 <meta content="34.060605" itemprop="latitude"/>,
 <meta content="-118.501625" itemprop="longitude"/>]

我们可以使用以下属性获取价格和其他信息:

In [22]: soup = Soup(response.text,"lxml")

In [23]: soup.select_one("meta[itemprop=price]")["content"]
Out[23]: '$12,895,000'

In [24]: soup.select_one("meta[name=twitter:description]")["content"]
Out[24]: 'Stunning brand new Contemporary Cape Cod Estate in Palisades Riviera by Huntington Estate Homes w/ 7 beds, 10 baths, + office in 10,500 sq ft on an 18,590 sq ft lot. Soaring ceilings, magnificent chandelier, &amp; floating staircase create a grand entrance w/ glass wine cellar, formal living &amp; dining rooms. Floor plan flows openly between gourmet kitchen, family room, &amp; patio with a set of disappearing Fleetwood Pocket doors. Fireplaces in living, family, &amp; master suite add warmth to the contemporary feel, &amp; detailed wood paneling &amp; coffered ceilings enhance quality of design throughout. Master suite opens completely to sweeping ocean views &amp; private patio. Lower level feats. Old Hollywood style theater w/130&quot; screen, surround sound, stadium seats, floor-to-ceiling suede panels, exercise pool, spa, gym, office, guest beds, open air patio, &amp; elevator access to take you from floor to floor. Perfect for entertaining - outdoor BBQ, seating, &amp; saltwater pool/spa complete this elegant estate.'
In [27]: soup.select_one("meta[itemprop=latitude]")["content"]
Out[27]: '34.060605'
In [28]: soup.select_one("meta[itemprop=longitude]")["content"]
Out[28]: '-118.501625'
In [29]: soup.select_one("meta[property=og:zillow_fb:address]")["content"]
Out[29]: '1630 Amalfi Dr, Pacific Palisades, CA 90272'

标签:python-2-7,beautifulsoup,web-scraping,python
来源: https://codeday.me/bug/20191118/2029495.html