cql - Understanding internal data storing by cassandra -

June 15, 2010

i have table

create table comment_by_post (     postid uuid,     userid uuid,     cmntid timeuuid,     cmnttxt text,        cmntby text,     time bigint,      primary key ((postid, userid),cmntid) )

here internal data in table

rowkey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92e537a => (name=d3f02a30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270721107000) => (name=d3f02a30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e743434, timestamp=1434270721107000) ------------------- rowkey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92eec7a => (name=465fee30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270483603000) => (name=465fee30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7432, timestamp=1434270483603000) => (name=4ba89f40-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270492468000) => (name=4ba89f40-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7431, timestamp=1434270492468000) => (name=504a61f0-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270500239000) => (name=504a61f0-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7433, timestamp=1434270500239000) ------------------- rowkey: 4978f728-0f96-11e5-a6c0-1697f925ec7b:4978f728-0f96-12e5-a6c0-1697f92e237a => (name=cd1e8f30-126f-11e5-879b-e700f669bcfc:, value=, timestamp=1434270709667000) => (name=cd1e8f30-126f-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7433, timestamp=1434270709667000)

if primary key (postid, userid,cmntid) like:

rowkey: 4978f728-0f96-11e5-a6c0-1697f925ec7b => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:971da150-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264176613000)  => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:971da150-1260-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7431, timestamp=1434264176613000)  => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a0d4a900-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264192912000)  => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a0d4a900-1260-11e5-879b-e700f669bcfc:cmnttxt, value=636d6e7432, timestamp=1434264192912000)  => (name=4978f728-0f96-12e5-a6c0-1697f92eec7a:a5d94c30-1260-11e5-879b-e700f669bcfc:, value=, timestamp=1434264201331000)

why , benefit of both ?

christopher explained how partitioning keys concatenated generate rowkey storage, won't re-hash (no pun intended) that. explain advantages , disadvantages of these 2 approaches.

primary key (postid, userid,cmntid)

with primary key, data partitioned postid, , clustered userid , cmntid. means, comments made on post stored on-disk postid, , sorted userid , cmntid (respectively).

the advantage here, have query flexibility. can query comments post, or comments post specific user.

the disadvantage, have higher chance of unbounded row growth other solution. if total columns per postid ever exceed 2 billion, max out how data store per postid. odds of storing comment data per post low, should ok.

primary key ((postid, userid),cmntid)

this solution helps negate possibility of unbounded row growth, storing comment data concatenated rowkey of postid , userid (sorted cmntid. that's advantage on other solution.

the disadvantage of losing query flexibility, need provide postid , userid every query. primary key definition not support queries comments postid, cassandra cql requires provide entire partition key query.

Search This Blog

ANgular

cql - Understanding internal data storing by cassandra -

Comments

Post a Comment

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -