java - Lucene vs Solr, indexning speed for sampe data -
i have worked upon lucene before , moving towards solr. problem not able indexing on solr fast lucene can do.
my lucene code:
public class luceneindexer { public static void main(string[] args) { string indexdir = "/home/demo/indexes/index1/"; indexwriterconfig indexwriterconfig = null; long starttime = system.currenttimemillis(); try (directory dir = fsdirectory.open(paths.get(indexdir)); analyzer analyzer = new standardanalyzer(); indexwriter indexwriter = new indexwriter(dir, (indexwriterconfig = new indexwriterconfig(analyzer)));) { indexwriterconfig.setopenmode(openmode.create); stringfield bat = new stringfield("bat", "", store.yes); //$non-nls-1$ //$non-nls-2$ stringfield id = new stringfield("id", "", store.yes); //$non-nls-1$ //$non-nls-2$ stringfield name = new stringfield("name", "", store.yes); //$non-nls-1$ //$non-nls-2$ stringfield id1 = new stringfield("id1", "", store.yes); //$non-nls-1$ //$non-nls-2$ stringfield name1 = new stringfield("name1", "", store.yes); //$non-nls-1$ //$non-nls-2$ stringfield id2 = new stringfield("id2", "", store.yes); //$non-nls-1$ //$non-nls-2$ document doc = new document(); doc.add(bat);doc.add(id);doc.add(name);doc.add(id1);doc.add(name1);doc.add(id2); (int = 0; < 1000000; ++i) { bat.setstringvalue("book"+i); id.setstringvalue("book id -" + i); name.setstringvalue("the legend of hobbit part 1 " + i); id1.setstringvalue("book id -" + i); name1.setstringvalue("the legend of hobbit part 2 " + i); id2.setstringvalue("book id -" + i);//doc.addfield("id2", "book id -" + i); //$non-nls-1$ indexwriter.adddocument(doc); } }catch(exception e) { e.printstacktrace(); } long endtime = system.currenttimemillis(); system.out.println("commited"); //$non-nls-1$ system.out.println("process completed in "+(endtime-starttime)/1000+" seconds"); //$non-nls-1$ //$non-nls-2$ } }
output: process completed in 19 seconds
followed solr code:
solrclient solrclient = new httpsolrclient("http://localhost:8983/solr/gettingstarted"); //$non-nls-1$ // empty database... solrclient.deletebyquery( "*:*" );// delete everything! //$non-nls-1$ system.out.println("cleared"); //$non-nls-1$ arraylist<solrinputdocument> docs = new arraylist<>(); long starttime = system.currenttimemillis(); (int = 0; < 1000000; ++i) { solrinputdocument doc = new solrinputdocument(); doc.addfield("bat", "biok"+i); //$non-nls-1$ //$non-nls-2$ doc.addfield("id", "biok id -" + i); //$non-nls-1$ //$non-nls-2$ doc.addfield("name", "tle legend of hobbit part 1 " + i); //$non-nls-1$ //$non-nls-2$ doc.addfield("id1", "bopk id -" + i); //$non-nls-1$ //$non-nls-2$ doc.addfield("name1", "tue legend of hobbit part 2 " + i); //$non-nls-1$ //$non-nls-2$ doc.addfield("id2", "bopk id -" + i); //$non-nls-1$ //$non-nls-2$ docs.add(doc); if (i % 250000 == 0) { solrclient.add(docs); docs.clear(); } } solrclient.add(docs); system.out.println("completed adding solr. commiting.. please wait"); //$non-nls-1$ solrclient.commit(); long endtime = system.currenttimemillis(); system.out.println("process completed in "+(endtime-starttime)/1000+" seconds"); //$non-nls-1$ //$non-nls-2$
output : process completed in 159 seconds
my pom.xml
<!-- solr dependency --> <dependency> <groupid>org.apache.solr</groupid> <artifactid>solr-solrj</artifactid> <version>5.0.0</version> </dependency> <!-- other dependency --> <dependency> <groupid>commons-logging</groupid> <artifactid>commons-logging</artifactid> <version>1.1.1</version> </dependency> <!-- lucene dependency --> <dependency> <groupid>org.apache.lucene</groupid> <artifactid>lucene-core</artifactid> <version>5.0.0</version> </dependency> <dependency> <groupid>org.apache.lucene</groupid> <artifactid>lucene-analyzers-common</artifactid> <version>5.0.0</version> </dependency>
i have downloaded solr 5.0 , have started solr using $solr/bin/solr start -e cloud -noprompt starts solr in 2 nodes.
i havent changed in solr setup have downloaded, can 1 guide me going wronge. read solr can used near real time indexing (http://lucene.apache.org/solr/features.html) , not able in demo code, though, lucene fast in indexing , can used in near real time if not real time.
i know solr uses lucene, mistake making.. still researching scenario.
any or guidance welcomed.
thanks in advance.!! cheers:)
solr general-purpose highly-configurable search server. lucene code in solr tuned general use, not specific use cases. tuning possible in configuration , request syntax.
well-tuned lucene code written specific use-case outperform solr. disadvantage must write, test, , debug low-level implementation of search code yourself. if that's not major disadvantage you, might want stick lucene. you'll have more capability solr can give you, , can make run faster.
the response got erick on solr mailing list relevant. best indexing performance, client must send updates solr in parallel.
the concurrentupdatesolrclient mentioned 1 way this, comes major disadvantage -- client code not informed if of indexing requests fails. cusc swallows exceptions.
if want proper exception handling, need manage threads , use httpsolrclient, or cloudsolrclient if choose run solrcloud. solrclient implementations thread-safe.
Comments
Post a Comment