neo4j - What is the most performant way to create the following MATCH statement and why? -


the question:
performant way create following match statement , why?

the detailed problem:
let's have place node variable amount of properties , need nodes potentially billions of nodes it's category. i'm trying wrap head around performance of each query , it's proving quite difficult.

the possible queries:

  • match place node using property lookup:
    match (entity:place { category: "food" })

  • match place node iscategory relationship food node:
    match (entity:place)-[:iscategory]->(category:food)

  • match place node food relationship category node:
    match (entity)-[category:food]->(:category)

  • match food node iscategoryfor relationship place node:
    match (category:food)-[:iscategoryfor]->(entity:place)

and variations in between. relationship directions going other way well.

more complexity:
let's throw in little more complexity , need find place nodes using multiple categories. example: find place nodes category food or bar tack on match statement? if not, performant route take here?

extra:
there tool me describe traversal process , tell me best method choose?

if understand domain correctly, recommend making categorys nodes themselves.

merge (:category {name:"food"}) merge (:category {name:"bar"}) merge (:category {name:"park"}) 

and connecting each place node categorys belongs to.

merge (:place {name:"central park"})-[:is_a]->(:category {name:"park"}) merge (:place {name:"joe's diner"})-[:is_a]->(:category {name:"food"}) merge (:place {name:"joe's diner"})-[:is_a]->(:category {name:"bar"}) 

then, if want find places belong category, can pretty quick. start matching category, branch out places related category.

match (c:category {name:"bar"}), (c)<-[:is_a]-(p:place) return p 

you'll have relatively limited number of categories, matching category quick. then, because of way neo4j stores data, fast find places related category.

more complexity

finding places within multiple categories easy well.

match (c:category) c.name = "bar" or c.name = "food", (c)<-[:is_a]-(p:place) return p 

again, match categories first (fast because there aren't many of them), branch out connected places.

use index

if want fast, need use indexes makes sense. in example, use index on category's name property.

create index on :category(name) 

or better yet, use uniqueness constraint on category names, index them , prevent duplicates.

create constraint on (c:category) assert c.name unique 

indexes (and uniqueness) make big difference on speed of queries.

why fastest

neo4j stores nodes , relationships in compact, quick-to-access format. once have node or relationship, getting adjacent relationships or nodes fast. however, stores each node's (and relationship's) properties separately, meaning looking through properties relatively slow.

the goal starting node possible. once there, traversing related entities quick. if have 1,000 categories, have billion places, faster pick out individual category individual place. once have starting node, getting related nodes efficient.

the other options

just reinforce, makes other options slower or otherwise worse.

in first example, looking through properties on each node match. property lookup slow , doing billion times. index can this, it's still lot of work. additionally, duplicating category data on each of billion places, , not taking advantage of neo4j's strengths.

in other examples, data models seem odd. "food", "bar", "park", etc. instances of categories, not separate types. should each own node, should have category label, because that's are. in addition, categories things, , should nodes. relationship describes connection between things. not make sense use categories in way.

i hope helps!


Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -