java - Lucene vs Solr, indexning speed for sampe data -


i have worked upon lucene before , moving towards solr. problem not able indexing on solr fast lucene can do.

my lucene code:

public class luceneindexer {  public static void main(string[] args) {      string indexdir = "/home/demo/indexes/index1/";      indexwriterconfig indexwriterconfig = null;      long starttime = system.currenttimemillis();      try (directory dir = fsdirectory.open(paths.get(indexdir));             analyzer analyzer = new standardanalyzer();             indexwriter indexwriter = new indexwriter(dir,                     (indexwriterconfig = new indexwriterconfig(analyzer)));) {         indexwriterconfig.setopenmode(openmode.create);              stringfield bat = new stringfield("bat", "", store.yes); //$non-nls-1$ //$non-nls-2$             stringfield id = new stringfield("id", "", store.yes); //$non-nls-1$ //$non-nls-2$             stringfield name = new stringfield("name", "", store.yes); //$non-nls-1$ //$non-nls-2$             stringfield id1 = new stringfield("id1", "", store.yes); //$non-nls-1$ //$non-nls-2$             stringfield name1 = new stringfield("name1", "", store.yes); //$non-nls-1$ //$non-nls-2$             stringfield id2 = new stringfield("id2", "", store.yes); //$non-nls-1$ //$non-nls-2$              document doc = new document();             doc.add(bat);doc.add(id);doc.add(name);doc.add(id1);doc.add(name1);doc.add(id2);          (int = 0; < 1000000; ++i) {               bat.setstringvalue("book"+i);              id.setstringvalue("book id -" + i);              name.setstringvalue("the legend of hobbit part 1 " + i);              id1.setstringvalue("book id -" + i);              name1.setstringvalue("the legend of hobbit part 2 " + i);               id2.setstringvalue("book id -" + i);//doc.addfield("id2", "book id -" + i); //$non-nls-1$                indexwriter.adddocument(doc);         }     }catch(exception e) {         e.printstacktrace();     }     long endtime = system.currenttimemillis();     system.out.println("commited"); //$non-nls-1$     system.out.println("process completed in "+(endtime-starttime)/1000+" seconds"); //$non-nls-1$ //$non-nls-2$ } } 

output: process completed in 19 seconds

followed solr code:

    solrclient solrclient = new httpsolrclient("http://localhost:8983/solr/gettingstarted"); //$non-nls-1$      // empty database...     solrclient.deletebyquery( "*:*" );// delete everything! //$non-nls-1$     system.out.println("cleared"); //$non-nls-1$     arraylist<solrinputdocument> docs = new arraylist<>();        long starttime = system.currenttimemillis();     (int = 0; < 1000000; ++i) {          solrinputdocument doc = new solrinputdocument();         doc.addfield("bat", "biok"+i); //$non-nls-1$ //$non-nls-2$         doc.addfield("id", "biok id -" + i); //$non-nls-1$ //$non-nls-2$         doc.addfield("name", "tle legend of hobbit part 1 " + i); //$non-nls-1$ //$non-nls-2$         doc.addfield("id1", "bopk id -" + i); //$non-nls-1$ //$non-nls-2$         doc.addfield("name1", "tue legend of hobbit part 2 " + i); //$non-nls-1$ //$non-nls-2$         doc.addfield("id2", "bopk id -" + i); //$non-nls-1$ //$non-nls-2$          docs.add(doc);          if (i % 250000 == 0) {             solrclient.add(docs);             docs.clear();         }     }     solrclient.add(docs);     system.out.println("completed adding solr. commiting.. please wait"); //$non-nls-1$     solrclient.commit();     long endtime = system.currenttimemillis();     system.out.println("process completed in "+(endtime-starttime)/1000+" seconds"); //$non-nls-1$ //$non-nls-2$ 

output : process completed in 159 seconds

my pom.xml

<!-- solr dependency -->     <dependency>         <groupid>org.apache.solr</groupid>         <artifactid>solr-solrj</artifactid>         <version>5.0.0</version>     </dependency>  <!-- other dependency -->        <dependency>         <groupid>commons-logging</groupid>         <artifactid>commons-logging</artifactid>         <version>1.1.1</version>     </dependency>  <!-- lucene dependency -->     <dependency>         <groupid>org.apache.lucene</groupid>         <artifactid>lucene-core</artifactid>         <version>5.0.0</version>     </dependency>     <dependency>         <groupid>org.apache.lucene</groupid>         <artifactid>lucene-analyzers-common</artifactid>         <version>5.0.0</version>     </dependency> 

i have downloaded solr 5.0 , have started solr using $solr/bin/solr start -e cloud -noprompt starts solr in 2 nodes.

i havent changed in solr setup have downloaded, can 1 guide me going wronge. read solr can used near real time indexing (http://lucene.apache.org/solr/features.html) , not able in demo code, though, lucene fast in indexing , can used in near real time if not real time.

i know solr uses lucene, mistake making.. still researching scenario.

any or guidance welcomed.

thanks in advance.!! cheers:)

solr general-purpose highly-configurable search server. lucene code in solr tuned general use, not specific use cases. tuning possible in configuration , request syntax.

well-tuned lucene code written specific use-case outperform solr. disadvantage must write, test, , debug low-level implementation of search code yourself. if that's not major disadvantage you, might want stick lucene. you'll have more capability solr can give you, , can make run faster.

the response got erick on solr mailing list relevant. best indexing performance, client must send updates solr in parallel.

the concurrentupdatesolrclient mentioned 1 way this, comes major disadvantage -- client code not informed if of indexing requests fails. cusc swallows exceptions.

if want proper exception handling, need manage threads , use httpsolrclient, or cloudsolrclient if choose run solrcloud. solrclient implementations thread-safe.


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -