problem description Link to heading
it is pretty simple I guess. I have an application which read data from a mongodb. every sunday I will just simply flush the db and update it using an updater. by doing so, I introduce 3 hours down time/inconsistency time. I want to minimize it.
what I researched Link to heading
mongodb replication Link to heading
in general, db repliaction is the process of storing data in more than one site or node. by doing so it improves the availability of data. at the same time, it also raises the problem of how to maintain consistency of data. it related to distributed system though.
advantage of replication is obvious. it
- increasing availability of data
- increasing data reliability
- user using different copy/replica of data to give high performance
also there are disadvantages.
- more space
- expensive updating accross all nodes
- maintaining data consistency ask for efforts
there are many types of data replication.
- transactional repliaction: transactional consistency is guaranteed. typically used in server-to-server environments. it reflects every changes.
- snapshot replication: snapshot replication is generally used when data change is not so frequent.
- merge replication: merge replication is the most complex type of repliaction. it usually used in server-to-client environment.
replication schemes are:
- full replication. every node has the same full replication.
- no replication. every node has its unique data.
- partial replication.
for mongodb, if one want to replicate db, first has to stop, and then give directions in commad line. one can refer to documentation or tutorial online.
db sharding (related but not a solution for this problem) Link to heading
sharding data in plain English is chopping data into pieces and store into different databases. by doing so, it will solve
- outrage of a db by decreasing data size on each db
- increasing performance
first one has a logical shards. it is how data shards according to some rule. it could be as easy as mod or more complicated as location.
after we have logical shards, we send logical shards to physical shards. physical shards may have less number than logical shards.
logical shards is similar to the concept of virtual memory whereas physical shards is similar with physical memory.
current solution: microservice design Link to heading
I think designing as microservice is the key to this problem.
so basically what I thought is having two mongodb, one is called primary and the other one is called backupdate. one gateway instance running I call it listener to listen to updater.
the working logic is that once updater is finished, it sends a signal to listener. listener will switch db from one to the other and rename the current one to primary and the other backupdate.
it finally restart app that consumes the mongodb. done.