Open Source MongoDB Reference Architectures
MongoDB
MongoDB is a document database programmed in C++, aiming to provide a scalable high-performance data storage solution for web applications. MongoDB is an intermediate between relational and non-relational databases. Compared with other non-relational databases, it provides more functions and is more similar to a relational database.
A record in MongoDB is a document or a data structure consists of fields and values. MongoDB documents are similar to JSON objects. The value of a field may include other documents, arrays, or document arrays.
MongoDB has the following features:
- High performance
- Abundant query languages
- High availability
- Scale-out capability
- Multiple storage engines
Introduction to the Replica Set
The MongoDB replica set synchronizes data on multiple servers to implement data redundant backup. The system stores data copies on multiple servers to improve data availability and ensure data security. The replica set also allows data to be recovered from hardware failures and service interruptions.
The MongoDB replica set requires at least two nodes. One node is the active node, which processes client requests. The other nodes are standby nodes, which replicate data on the active node. MongoDB nodes are commonly configured as one active node + one standby node or multiple standby nodes. All operations on the active node are recorded on the oplog. The standby nodes periodically poll the active node to obtain the operations, and then performs these operations on their data copies to ensure that data on the standby node is consistent with that on the active node.
Figure 1 shows the MongoDB replica set structure.
Sharded Cluster
When MongoDB stores massive data, one machine may not be able to store all data or provide sufficient read/write throughput. To address this problem, the data can be divided on multiple machines so that the database system can store and process more data.
Figure 2 shows the MongoDB sharded cluster architecture.
As shown in Figure 2, the MongoDB sharded cluster consists of the following components:
- Shard: stores data blocks. In the production environment, a shard server can be a replica set of multiple machines to prevent single points of failure (SPOF).
- Config Server: MongoDB instance, which stores the entire ClusterMetadata including the chunk information.
- Query Routers: frontend routers, which allow clients to access the cluster. Frontend applications can transparently use the cluster as a single database.
Typical Reference Architecture
- Three mongos routing nodes (distributed on different servers to improve the routing throughput)
- Three config nodes (distributed on different servers for redundancy)
- N shard storage nodes (configured based on the user capacity. More than four shard storage nodes are recommended to improve the CPU usage of the entire system.)
- Two or three copies

