python - How can I get data from a specific class of a html tag using beautifulsoup? -
i want data located(name, city , address) in div
tag html file this:
<div class="maininfowrapper"> <h4 itemprop="name">name</h4> <div> <a href="/wiki/province/tehran"></a> city <a href="/wiki/city/tehran"></a> address </div> </div>
i don't know how can data want in specific tag. i'm using python beautifulsoup
library.
there several <h4>
tags in source html, 1 <h4>
itemprop="name"
attribute, can search first. access remaining values there. note following html correctly reproduced source page, whereas html in question not:
from bs4 import beautifulsoup html = '''<div class="maininfowrapper"> <h4 itemprop="name"> name </h4> <div> <a href="/wiki/province/tehran">province</a> - <a href="/wiki/city/tehran">city</a> address </div> </div>''' soup = beautifulsoup(html) name_tag = soup.find('h4', itemprop='name') addr_div = name_tag.find_next_sibling('div') province_tag, city_tag = addr_div.find_all('a') name, province, city = [t.text.strip() t in name_tag, province_tag, city_tag] address = city_tag.next_sibling.strip()
when run url provided
import requests bs4 import beautifulsoup r = requests.get('http://goo.gl/scxnp2') soup = beautifulsoup(r.content) name_tag = soup.find('h4', itemprop='name') addr_div = name_tag.find_next_sibling('div') province_tag, city_tag = addr_div.find_all('a') name, province, city = [t.text.strip() t in name_tag, province_tag, city_tag] address = city_tag.next_sibling.strip() >>> print name بیمارستان حضرت فاطمه (س) >>> print province تهران >>> print city تهران >>> print address یوسف آباد، خیابان بیست و یکم، جنب پارک شفق، بیمارستان ترمیمی پلاستیک فک و صورت
i'm not sure printed output correct on terminal, however, code should produce correct text configured terminal.
Comments
Post a Comment