cql - Understanding internal data storing by cassandra -
i have table
create table comment_by_post ( postid uuid, userid uuid, cmntid timeuuid, cmnttxt text, cmntby text, time bigint, primary key ((postid, userid),cmntid) )
here internal data in table
rowkey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92e537a => (name=d3f02a30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270721107000) => (name=d3f02a30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e743434, timestamp=1434270721107000) ------------------- rowkey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92eec7a => (name=465fee30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270483603000) => (name=465fee30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7432, timestamp=1434270483603000) => (name=4ba89f40-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270492468000) => (name=4ba89f40-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7431, timestamp=1434270492468000) => (name=504a61f0-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270500239000) => (name=504a61f0-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7433, timestamp=1434270500239000) ------------------- rowkey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92e237a => (name=cd1e8f30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270709667000) => (name=cd1e8f30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7433, timestamp=1434270709667000)
if primary key (postid, userid,cmntid)
like:
rowkey: 4978f728-0f96-11e5-a6c0-1697f925ec7b => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:971da150-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264176613000) => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:971da150-1260-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7431, timestamp=1434264176613000) => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a0d4a900-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264192912000) => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a0d4a900-1260-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7432, timestamp=1434264192912000) => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a5d94c30-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264201331000)
why , benefit of both ?
christopher explained how partitioning keys concatenated generate rowkey storage, won't re-hash (no pun intended) that. explain advantages , disadvantages of these 2 approaches.
primary key (postid, userid,cmntid)
with primary key, data partitioned postid
, , clustered userid
, cmntid
. means, comments made on post stored on-disk postid
, , sorted userid
, cmntid
(respectively).
the advantage here, have query flexibility. can query comments post, or comments post specific user.
the disadvantage, have higher chance of unbounded row growth other solution. if total columns per postid
ever exceed 2 billion, max out how data store per postid
. odds of storing comment data per post low, should ok.
primary key ((postid, userid),cmntid)
this solution helps negate possibility of unbounded row growth, storing comment data concatenated rowkey of postid
, userid
(sorted cmntid
. that's advantage on other solution.
the disadvantage of losing query flexibility, need provide postid
, userid
every query. primary key definition not support queries comments postid
, cassandra cql requires provide entire partition key query.
Comments
Post a Comment