html - Python, lxml and xpath: returns "[<Element x at 0x29a9998>] rather than expected value -
i'm trying scrape td asset management pages (example below; can't post more 2 links) in order retrieve "price on" value, i.e. dollar amount in snippet of html:
<div class="td-layout-grid9 td-layout-column td-layout-column-first"> price on: jun 12, 2015 <br> <strong>$14.54 </strong> <strong class="td-copy-red">-0.01 (-0.07%)</strong> </div>
i hoping achieve python, requests, lxml, , xpath, installed follows:
apt-get update apt-get install python python-pip python-dev gcc build-essential libxml2-dev libxslt-dev libffi-dev libssl-dev pip install lxml pip install requests pip install requests[security]
next, retrieve page did this:
python >>> lxml import html >>> import requests >>> page = requests.get('https://www.tdassetmanagement.com/funddetails.form?fundid=6320&lang=en') >>> tree = html.fromstring(page.text)
finally, attempt made retrieve desired dollar value using xpath of relevant element obtained chrome's "inspect element" tool:
>>> price = tree.xpath('//*[@id="fundcardvo"]/div[2]/div[1]/div[1]/div[1]/strong[1]') >>> print price
unfortunately result [<element strong @ 0x29a9998>]
rather expected dollar amount $14.54
.
to ensure expected data retrieved initial "requests.get", ran this:
>>> print page.content
the result can seen here: http://pastebin.com/f5c4mfqb.
if paste above html tool: http://videlibri.sourceforge.net/cgi-bin/xidelcgi xpath query //*[@id="fundcardvo"]/div[2]/div[1]/div[1]/div[1]/strong[1]
returns dollar amount expected.
any hints or tips how might able use python, lxml, , xpath retrieve desired value element appreciated. if there's different way going obtain same result interested in too.
thanks.
after further googling find out elements (they're lists of things attributes tag
or text
), followed more googling regarding unicodeencodeerror
(see unicodeencodeerror: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)) able obtain desired value this:
>>> priceelement = tree.xpath('//*[@id="fundcardvo"]/div[2]/div[1]/div[1]/div[1]/strong[1]') >>> priceascii = priceelement[0].text >>> price = priceascii.encode('utf-8') >>> print price
thanks nudging me in right direction jonrsharpe.
i still not able determine how obtain list of available attributes element though, tag
, text
available.
i went on number (without dollar symbol , trailing non-breaking spaces) this:
>>> import re >>> p = re.search('[0-9]{1,3}\.[0-9]{2}', price) >>> price = p.group(0) >>> print price
Comments
Post a Comment