How to have a spark dataframe be constantly updated as writes occur in the db backend? -


basically have spark sitting in front of database , wondering how go having dataframe updated new data backend.

the trivial way can think of solving run query against database every couple minutes inefficient , still result in having stale data time between updates.

i not 100% sure if database i'm working has restriction think rows added, there no modifications existing rows.

df rdd+schema+many other functionalities. basic spark design, rdd immutable. hence, can not update df after materialized. in case, can mix streaming + sql below:

  1. in db, write data queue along writes in tables
  2. use spark queue stream consume queue , create dstreams (rdds every x seconds)
  3. for each incoming rdd, join existing df , create new df

Comments

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -