problem description Link to heading

it is pretty simple I guess. I have an application which read data from a mongodb. every sunday I will just simply flush the db and update it using an updater. by doing so, I introduce 3 hours down time/inconsistency time. I want to minimize it.

what I researched Link to heading

mongodb replication Link to heading

in general, db repliaction is the process of storing data in more than one site or node. by doing so it improves the availability of data. at the same time, it also raises the problem of how to maintain consistency of data. it related to distributed system though.

advantage of replication is obvious. it

  • increasing availability of data
  • increasing data reliability
  • user using different copy/replica of data to give high performance

also there are disadvantages.

  • more space
  • expensive updating accross all nodes
  • maintaining data consistency ask for efforts

there are many types of data replication.

  1. transactional repliaction: transactional consistency is guaranteed. typically used in server-to-server environments. it reflects every changes.
  2. snapshot replication: snapshot replication is generally used when data change is not so frequent.
  3. merge replication: merge replication is the most complex type of repliaction. it usually used in server-to-client environment.

replication schemes are:

  1. full replication. every node has the same full replication.
  2. no replication. every node has its unique data.
  3. partial replication.

for mongodb, if one want to replicate db, first has to stop, and then give directions in commad line. one can refer to documentation or tutorial online.

sharding data in plain English is chopping data into pieces and store into different databases. by doing so, it will solve

  • outrage of a db by decreasing data size on each db
  • increasing performance

first one has a logical shards. it is how data shards according to some rule. it could be as easy as mod or more complicated as location.

after we have logical shards, we send logical shards to physical shards. physical shards may have less number than logical shards.

logical shards is similar to the concept of virtual memory whereas physical shards is similar with physical memory.

current solution: microservice design Link to heading

I think designing as microservice is the key to this problem.

so basically what I thought is having two mongodb, one is called primary and the other one is called backupdate. one gateway instance running I call it listener to listen to updater.

the working logic is that once updater is finished, it sends a signal to listener. listener will switch db from one to the other and rename the current one to primary and the other backupdate.

it finally restart app that consumes the mongodb. done.

credit to Link to heading