html - Python, lxml and xpath: returns "[<Element x at 0x29a9998>] rather than expected value -

May 15, 2011

i'm trying scrape td asset management pages (example below; can't post more 2 links) in order retrieve "price on" value, i.e. dollar amount in snippet of html:

<div class="td-layout-grid9 td-layout-column td-layout-column-first"> price on: jun 12, 2015 <br> <strong>$14.54  </strong> <strong class="td-copy-red">-0.01 (-0.07%)</strong> </div>

i hoping achieve python, requests, lxml, , xpath, installed follows:

apt-get update apt-get install python python-pip python-dev gcc build-essential libxml2-dev libxslt-dev libffi-dev libssl-dev pip install lxml pip install requests pip install requests[security]

next, retrieve page did this:

python >>> lxml import html >>> import requests >>> page = requests.get('https://www.tdassetmanagement.com/funddetails.form?fundid=6320&lang=en') >>> tree = html.fromstring(page.text)

finally, attempt made retrieve desired dollar value using xpath of relevant element obtained chrome's "inspect element" tool:

>>> price = tree.xpath('//*[@id="fundcardvo"]/div[2]/div[1]/div[1]/div[1]/strong[1]') >>> print price

unfortunately result [<element strong @ 0x29a9998>] rather expected dollar amount $14.54  .

to ensure expected data retrieved initial "requests.get", ran this:

>>> print page.content

the result can seen here: http://pastebin.com/f5c4mfqb.

if paste above html tool: http://videlibri.sourceforge.net/cgi-bin/xidelcgi xpath query //*[@id="fundcardvo"]/div[2]/div[1]/div[1]/div[1]/strong[1] returns dollar amount expected.

any hints or tips how might able use python, lxml, , xpath retrieve desired value element appreciated. if there's different way going obtain same result interested in too.

thanks.

after further googling find out elements (they're lists of things attributes tag or text), followed more googling regarding unicodeencodeerror (see unicodeencodeerror: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)) able obtain desired value this:

>>> priceelement = tree.xpath('//*[@id="fundcardvo"]/div[2]/div[1]/div[1]/div[1]/strong[1]') >>> priceascii = priceelement[0].text >>> price = priceascii.encode('utf-8') >>> print price

thanks nudging me in right direction jonrsharpe.

i still not able determine how obtain list of available attributes element though, tag , text available.

i went on number (without dollar symbol , trailing non-breaking spaces) this:

>>> import re >>> p = re.search('[0-9]{1,3}\.[0-9]{2}', price) >>> price = p.group(0) >>> print price

Search This Blog

ANgular

html - Python, lxml and xpath: returns "[<Element x at 0x29a9998>] rather than expected value -

Comments

Post a Comment

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -