python - LXML - parse td content within tr tag -


i want parse each individual statistic yahoo finance tables formatting purposes - when parsing entire table formatting terrible!! using code below , have repeat 4 lines of contenta code altered retrieve stats within each row of table. exemplified in contentb variables below. refuse believe efficient way so. suggestions?

from lxml import html     url = 'http://finance.yahoo.com/q/is?s=mmm+income+statement&annual'  tree = html.parse(url)  contenta = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[2]/td[1]")[0].text_content().strip() contenta1 = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[2]/td[2]")[0].text_content().strip() contenta2 = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[2]/td[3]")[0].text_content().strip() contenta3 = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[2]/td[4]")[0].text_content().strip()  contentb = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[3]/td[1]")[0].text_content().strip() contentb1 = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[3]/td[2]")[0].text_content().strip() contentb2 = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[3]/td[3]")[0].text_content().strip() contentg3 = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[3]/td[4]")[0].text_content().strip() 

use range , format

for in range(1,5):     contenta = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[2]/td[{i}]".format(i=i))[0].text_content().strip()     print(contenta) 

output

total revenue 31,821,000 30,871,000 29,904,000 

for in range(1,5):     contentb = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[3]/td[{i}]".format(i=i))[0].text_content().strip()     print(contentb) 

output

cost of revenue 16,447,000 16,106,000 15,685,000 

edit

in [22]: d = {}  in [23]: d.setdefault('revenue', []) out[23]: []  in [24]: in range(2,5):    ....:     contentb = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[3]/td[{i}]".format(i=i))[0].text_content().strip()    ....:     d['revenue'].append(int(contentb.replace(',', '')))    ....:       in [25]: d out[25]: {'revenue': [16447000, 16106000, 15685000]} 

Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -