python - How can I get data from a specific class of a html tag using beautifulsoup? -


i want data located(name, city , address) in div tag html file this:

<div class="maininfowrapper">     <h4 itemprop="name">name</h4>     <div>         <a href="/wiki/province/tehran"></a>          city         <a href="/wiki/city/tehran"></a>          address     </div> </div> 

i don't know how can data want in specific tag. i'm using python beautifulsoup library.

there several <h4> tags in source html, 1 <h4> itemprop="name" attribute, can search first. access remaining values there. note following html correctly reproduced source page, whereas html in question not:

from bs4 import beautifulsoup  html = '''<div class="maininfowrapper">     <h4 itemprop="name">                     name         &nbsp;                                </h4>                                <div>                                    <a href="/wiki/province/tehran">province</a> - <a href="/wiki/city/tehran">city</a> address     </div>                           </div>'''  soup = beautifulsoup(html) name_tag = soup.find('h4', itemprop='name') addr_div = name_tag.find_next_sibling('div') province_tag, city_tag = addr_div.find_all('a')  name, province, city = [t.text.strip() t in name_tag, province_tag, city_tag] address = city_tag.next_sibling.strip() 

when run url provided

import requests bs4 import beautifulsoup  r = requests.get('http://goo.gl/scxnp2') soup = beautifulsoup(r.content) name_tag = soup.find('h4', itemprop='name') addr_div = name_tag.find_next_sibling('div') province_tag, city_tag = addr_div.find_all('a')  name, province, city = [t.text.strip() t in name_tag, province_tag, city_tag] address = city_tag.next_sibling.strip()  >>> print name بیمارستان حضرت فاطمه (س) >>> print province تهران >>> print city تهران >>> print address یوسف آباد، خیابان بیست و یکم، جنب پارک شفق، بیمارستان ترمیمی پلاستیک فک و صورت 

i'm not sure printed output correct on terminal, however, code should produce correct text configured terminal.


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -