neo4j - What is the most performant way to create the following MATCH statement and why? -
the question:
performant way create following match
statement , why?
the detailed problem:
let's have place
node variable amount of properties , need nodes potentially billions of nodes it's category. i'm trying wrap head around performance of each query , it's proving quite difficult.
the possible queries:
match
place
node usingproperty
lookup:
match (entity:place { category: "food" })
match
place
nodeiscategory
relationshipfood
node:
match (entity:place)-[:iscategory]->(category:food)
match
place
nodefood
relationshipcategory
node:
match (entity)-[category:food]->(:category)
match
food
nodeiscategoryfor
relationshipplace
node:
match (category:food)-[:iscategoryfor]->(entity:place)
and variations in between. relationship directions going other way well.
more complexity:
let's throw in little more complexity , need find place
nodes using multiple categories. example: find place
nodes category food
or bar
tack on match
statement? if not, performant route take here?
extra:
there tool me describe traversal process , tell me best method choose?
if understand domain correctly, recommend making category
s nodes themselves.
merge (:category {name:"food"}) merge (:category {name:"bar"}) merge (:category {name:"park"})
and connecting each place
node category
s belongs to.
merge (:place {name:"central park"})-[:is_a]->(:category {name:"park"}) merge (:place {name:"joe's diner"})-[:is_a]->(:category {name:"food"}) merge (:place {name:"joe's diner"})-[:is_a]->(:category {name:"bar"})
then, if want find place
s belong category
, can pretty quick. start matching category, branch out places related category.
match (c:category {name:"bar"}), (c)<-[:is_a]-(p:place) return p
you'll have relatively limited number of categories, matching category quick. then, because of way neo4j stores data, fast find places related category.
more complexity
finding places within multiple categories easy well.
match (c:category) c.name = "bar" or c.name = "food", (c)<-[:is_a]-(p:place) return p
again, match categories first (fast because there aren't many of them), branch out connected places.
use index
if want fast, need use indexes makes sense. in example, use index on category's name
property.
create index on :category(name)
or better yet, use uniqueness constraint on category names, index them , prevent duplicates.
create constraint on (c:category) assert c.name unique
indexes (and uniqueness) make big difference on speed of queries.
why fastest
neo4j stores nodes , relationships in compact, quick-to-access format. once have node or relationship, getting adjacent relationships or nodes fast. however, stores each node's (and relationship's) properties separately, meaning looking through properties relatively slow.
the goal starting node possible. once there, traversing related entities quick. if have 1,000 categories, have billion places, faster pick out individual category
individual place
. once have starting node, getting related nodes efficient.
the other options
just reinforce, makes other options slower or otherwise worse.
in first example, looking through properties on each node match. property lookup slow , doing billion times. index can this, it's still lot of work. additionally, duplicating category data on each of billion places, , not taking advantage of neo4j's strengths.
in other examples, data models seem odd. "food", "bar", "park", etc. instances of categories, not separate types. should each own node, should have category
label, because that's are. in addition, categories things, , should nodes. relationship describes connection between things. not make sense use categories in way.
i hope helps!
Comments
Post a Comment