Python 3.4 : XPATH : loop through tr tags and embedded td tags -
the tr[2] specified below in contentb retrieve 1 tr tag when loop through of tr tags in table append td content list e.
for in range(1,5): contentb = tree.xpath("//table[@class='yfnc_tabledata1']/tr[1]/td/table/tr[2]/td[{i}]".format(i=i))[0].text_content().strip() if re.match(r'[a-z]', contentb) none: contentb = int(contentb.replace(',', '')) e.append(contentb) print(e)
the text below snippet of html working
<table cellspacing="0" cellpadding="0" border="0" width="100%" class="yfnc_tabledata1" id="yui_3_9_1_9_1434360249110_44"><tbody id="yui_3_9_1_9_1434360249110_43"><tr id="yui_3_9_1_9_1434360249110_42"><td id="yui_3_9_1_9_1434360249110_41"><table cellspacing="0" cellpadding="2" border="0" width="100%" id="yui_3_9_1_9_1434360249110_40"><tbody id="yui_3_9_1_9_1434360249110_39"><tr style="border-top:none;" class="yfnc_modtitle1"><td style="border-top:2px solid #000;" colspan="2"><small><span class="yfi-module-title">period ending</span></small></td><th style="border-top:2px solid #000;text-align:right; font-weight:bold" scope="col">dec 31, 2014</th><th style="border-top:2px solid #000;text-align:right; font-weight:bold" scope="col">dec 31, 2013</th><th style="border-top:2px solid #000;text-align:right; font-weight:bold" scope="col">dec 31, 2012</th></tr><tr id="yui_3_9_1_9_1434360249110_38"><td colspan="2" id="yui_3_9_1_9_1434360249110_37"> <strong> total revenue </strong> </td><td align="right"> <strong> 31,821,000 </strong> </td><td align="right"> <strong> 30,871,000 </strong> </td><td align="right"> <strong> 29,904,000 </strong> </td></tr><tr><td colspan="2">cost of revenue</td><td align="right">16,447,000 </td><td align="right">16,106,000 </td><td align="right">15,685,000 </td></tr><tr><td style="height:0;padding:0; border-top:3px solid #333;" colspan="5"><span style="display:block; width:5px; height:1px;"></span></td></tr><tr><td colspan="2"> <strong> gross profit </strong> </td><td align="right"> <strong> 15,374,000 </strong> </td><td align="right"> <strong> 14,765,000 </strong> </td><td align="right"> <strong> 14,219,000 </strong> </td></tr><tr><td style="height:0;padding:0; " colspan="5"><span style="display:block; width:5px; height:10px;"></span></td></tr><tr> <td><spacer width="1" height="1" type="block"></spacer></td> <td colspan="4" class="yfnc_d">operating expenses</td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>research development</td><td align="right">1,770,000 </td><td align="right">1,715,000 </td><td align="right">1,634,000 </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>selling general , administrative</td><td align="right">6,469,000 </td><td align="right">6,384,000 </td><td align="right">6,102,000 </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>non recurring</td><td align="right"> - </td><td align="right"> - </td><td align="right"> - </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>others</td><td align="right"> - </td><td align="right"> - </td><td align="right"> - </td></tr><tr> <td><spacer width="1" height="1" type="block"></spacer></td> <td class="yfnc_d" style="height:0; padding:0; " colspan="5"><span style="display:block; width:5px; height:1px;"></span></td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>total operating expenses</td><td align="right"> - </td><td align="right"> - </td><td align="right"> - </td></tr><tr><td style="height:0;padding:0; " colspan="5"><span style="display:block; width:5px; height:10px;"></span></td></tr><tr><td style="height:0;padding:0; border-top:3px solid #333;" colspan="5"><span style="display:block; width:5px; height:1px;"></span></td></tr><tr><td colspan="2"> <strong> operating income or loss </strong> </td><td align="right"> <strong> 7,135,000 </strong> </td><td align="right"> <strong> 6,666,000 </strong> </td><td align="right"> <strong> 6,483,000 </strong> </td></tr><tr><td style="height:0;padding:0; " colspan="5"><span style="display:block; width:5px; height:10px;"></span></td></tr><tr> <td><spacer width="1" height="1" type="block"></spacer></td> <td colspan="4" class="yfnc_d">income continuing operations</td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>total other income/expenses net</td><td align="right">33,000 </td><td align="right">41,000 </td><td align="right">39,000 </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>earnings before interest , taxes</td><td align="right">7,168,000 </td><td align="right">6,707,000 </td><td align="right">6,522,000 </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>interest expense</td><td align="right">142,000 </td><td align="right">145,000 </td><td align="right">171,000 </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>income before tax</td><td align="right">7,026,000 </td><td align="right">6,562,000 </td><td align="right">6,351,000 </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>income tax expense</td><td align="right">2,028,000 </td><td align="right">1,841,000 </td><td align="right">1,840,000 </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>minority interest</td><td align="right">(42,000)</td><td align="right">(62,000)</td><td align="right">(67,000)</td></tr><tr> <td><spacer width="1" height="1" type="block"></spacer></td> <td class="yfnc_d" style="height:0; padding:0; " colspan="5"><span style="display:block; width:5px; height:1px;"></span></td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>net income continuing ops</td><td align="right">4,956,000 </td><td align="right">4,659,000 </td><td align="right">4,444,000 </td></tr><tr><td style="height:0;padding:0; " colspan="5"><span style="display:block; width:5px; height:10px;"></span></td></tr><tr> <td><spacer width="1" height="1" type="block"></spacer></td> <td colspan="4" class="yfnc_d">non-recurring events</td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>discontinued operations</td><td align="right"> - </td><td align="right"> - </td><td align="right"> - </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>extraordinary items</td><td align="right"> - </td><td align="right"> - </td><td align="right"> - </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>effect of accounting changes</td><td align="right"> - </td><td align="right"> - </td><td align="right"> - </td></tr><tr> <td width="30" class="yfnc_tabledata1"><spacer height="1" width="30" type="block"></spacer></td> <td>other items</td><td align="right"> - </td><td align="right"> - </td><td align="right"> - </td></tr><tr><td style="height:0;padding:0; " colspan="5"><span style="display:block; width:5px; height:10px;"></span></td></tr><tr><td style="height:0;padding:0; border-top:3px solid #333;" colspan="5"><span style="display:block; width:5px; height:1px;"></span></td></tr><tr><td colspan="2"> <strong> net income </strong> </td><td align="right"> <strong> 4,956,000 </strong> </td><td align="right"> <strong> 4,659,000 </strong> </td><td align="right"> <strong> 4,444,000 </strong> </td></tr><tr><td colspan="2">preferred stock , other adjustments</td><td align="right"> - </td><td align="right"> - </td><td align="right"> - </td></tr><tr><td style="height:0;padding:0; border-top:3px solid #333;" colspan="5"><span style="display:block; width:5px; height:1px;"></span></td></tr><tr><td colspan="2"> <strong> net income applicable common shares </strong> </td><td align="right"> <strong> 4,956,000 </strong> </td><td align="right"> <strong> 4,659,000 </strong> </td><td align="right"> <strong> 4,444,000 </strong> </td></tr></tbody></table></td></tr></tbody></table>
if correctly understand asking, need replace tr[2] tr.
the predicate [2] here restricts second matching tr element; removing removes restriction.
edited
to extract text content of table cells, can modify code as:
for in range(1,5): # list of cells in column of table collist = tree.xpath("//table[@class='yfnc_tabledata1']//table//tr/td[{i}]".format(i=i)) contentb = [c.text_content().strip() c in collist] # here contentb list each element text of 1 of cells # in column of table ##continue processing per desired result...
Comments
Post a Comment