node.js - getting back additional info when webscraping with cheerio js -

June 15, 2010

i working cheerio.js make simple web scraper. reason not respond html tags. 1 div cannot target the div class of 'datatables_scrollbody' on website scraping: http://www.caffeineinformer.com/the-caffeine-database.

however, think found work-around problem.

i read through documentation https://github.com/cheeriojs/cheerio , following format $( selector, [context], [root] .

$(".main, div:nth-child(3) ").filter(function(){         var data = $(this).prev().text();         console.log(data); })

in console getting data desire 2 problems

1.  caffeine content of drinks coffee soda energy drinks tea shots     loading data.../*<![cdata[*/var totalrows=1127;     var latestdate='06/12/2015';var tbldata=

i not see info on page.

2.  getting data 2 times.

i put in console.log data length. got 8 different lengths. believe there workaround. however, cannot figure out.

does have knowledge on matter?

datatables javascript library dynamically creates, inserts , modifies html elements in dom, after page has been loaded. table want scrape created dynamically, scraper works on static html.

the data used generate table stored javascript in page source, in variable called tbldata (see this gist).

two possible solutions:

use phantomjs load page, run js on page. after that, can take dom , parse using cheerio;
scrape table data embedded javascript directly.

Search This Blog

ANgular

node.js - getting back additional info when webscraping with cheerio js -

Comments

Post a Comment

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -