Open-Source PostgreSQL Reference Architectures

Scenarios

The open-source PostgreSQL is a typical OLTP database. It has many features in large-scale commercial RDBMSs, including transactions, subqueries, triggers, views, foreign key reference integrity, and complex lock functions. It is one of the most powerful open-source databases and applies to database systems with comprehensive functions, such as geospatial data processing. Many databases of the commercial MPP architecture are developed based on PostgreSQL.

Working Mechanism

Figure 1 shows the PostgreSQL working mechanism. Table 1 describes the major processes.

Figure 1 PostgreSQL working mechanism

**Table 1** PostgreSQL processes
Type	Name	Description
Main process	Postmaster	As the first process to be started, Postmaster controls all the backend processes and is responsible for accepting and closing the database connections. By default, this process listens on Unix domain sockets and TCP/IP port 5432 and waits for the frontend to process the connections. You can change the listening port number in the postgresql.conf file of PostgreSQL.
Session service process	Postgres	This subprocess determines whether to allow a connection according to the security policy defined by the pg_hba.conf file. According to the security policy, this subprocess rejects certain IP addresses and networks, allows only certain users to connect to databases, or allows connections only to certain databases. Postgres receives the query from the frontend, retrieves data in the target database, and returns the result. It also updates the database sometimes. The updated data is recorded in transaction logs (also called WAL logs in PostgreSQL). When the system is powered off, or the server breaks down or is restarted, the logs can be used for data restoration. In addition, the logs can be archived for data restoration in other scenarios. In PostgreSQL 9.0 or later, WAL logs can be transferred to other PostgreSQL systems to implement real-time database replication.
Auxiliary process	SysLogger (system log process)	SysLogger is started by the main process only when logging_collection is set to on in the postgresql.conf file.
	BgWriter (background write process)	This process writes dirty pages from the shared memory to drives to improve the performance of data insertion, update, and deletion.
	WalWriter (write-ahead log process)	This process writes modification operations into drives before data is modified so that the data does not need to be persisted into files in subsequent real-time data updates.
	PgArch (archiving process)	WAL logs are recycled. The PgArch process backs up WAL logs before archiving them. After the entire database is backed up, the Point in Time Recovery (PITR) technology can be used to archive WAL logs. The database can be restored to any point after the full backup by using the full backup data and the subsequently archived WAL logs.
	AutoVacuum (automatic clearing process)	In the PostgreSQL database, after a DELETE operation is performed on a table, the old data is not immediately deleted. When new data is added, the system creates a data row instead of overwriting the old data. The old data is only marked as deleted and will be cleared only when no other concurrent transactions are reading the data. In this case, the data is cleared by the AutoVacuum process.
	PgStat (statistics collection process)	This process collects statistical information, such as the number of table additions, deletions, or updates, the number of data block read and write operations, and the number of row read operations in indexes. The collected statistical information is used for the query optimizer to make correct judgment and choose the best execution plan. pg_statistic stores the information collected by PgStat.
	CheckPoint (checkpoint process)	Checkpoints are a sequence of transaction points set by the system. Setting a checkpoint ensures that the WAL log information before the checkpoint is flushed to drives.

Typical Component Architectures

Figure 2 shows the mainstream PostgreSQL reference architecture primary-standby streaming replication.

Figure 2 Primary-standby streaming replication solution

Common reference architectures include the following:

Primary-standby streaming replication solution
LVS + keepalive HA solution
PostgreSQL sharding with open-source Pgpool

In the distributed architecture or the deployment of more than two instances, it is recommended that multiple instances be deployed and bound to CPU cores on each server.

Parent topic: Architecture