How to have a spark dataframe be constantly updated as writes occur in the db backend? -

August 15, 2015

basically have spark sitting in front of database , wondering how go having dataframe updated new data backend.

the trivial way can think of solving run query against database every couple minutes inefficient , still result in having stale data time between updates.

i not 100% sure if database i'm working has restriction think rows added, there no modifications existing rows.

df rdd+schema+many other functionalities. basic spark design, rdd immutable. hence, can not update df after materialized. in case, can mix streaming + sql below:

in db, write data queue along writes in tables
use spark queue stream consume queue , create dstreams (rdds every x seconds)
for each incoming rdd, join existing df , create new df

Search This Blog

ANgular

How to have a spark dataframe be constantly updated as writes occur in the db backend? -

Comments

Post a Comment

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -