Table of Contents
- Understanding PostgreSQL Clustering
- 1.1 What is Clustering?
- 1.2 Why Clustering Matters: HA vs. Scalability
- Key Concepts in PostgreSQL Clustering
- 2.1 Replication (Synchronous vs. Asynchronous)
- 2.2 Failover (Automatic vs. Manual)
- 2.3 Load Balancing
- 2.4 Sharding
- Setting Up High Availability (HA) with Patroni
- 3.1 Prerequisites
- 3.2 Step 1: Install PostgreSQL and Dependencies
- 3.3 Step 2: Configure etcd (Distributed Configuration Store)
- 3.4 Step 3: Install and Configure Patroni
- 3.5 Step 4: Initialize the Cluster
- 3.6 Step 5: Test Failover
- Scaling PostgreSQL: Read Replicas and Citus
- 4.1 Read Replicas: Setup and Configuration
- 4.2 Citus: Distributed PostgreSQL for Scalability
- Monitoring and Maintenance
- 5.1 Monitoring Tools
- 5.2 Backup Strategies
- Best Practices for PostgreSQL Clustering
- Conclusion
- References
1. Understanding PostgreSQL Clustering
1.1 What is Clustering?
A PostgreSQL cluster is a group of interconnected database instances (nodes) working together to provide redundancy, fault tolerance, or increased performance. Unlike a single database instance, a cluster distributes work across nodes, ensuring no single point of failure (for HA) or分担负载 (for scalability).
1.2 Why Clustering Matters: HA vs. Scalability
Clustering solves two primary problems:
- High Availability (HA): Minimizes downtime by automatically recovering from failures (e.g., server crashes, network outages). If the primary node fails, a standby node takes over seamlessly.
- Scalability: Handles growth in data volume or query load. This includes:
- Read Scalability: Offloading read traffic to replicas.
- Write Scalability: Distributing write traffic across sharded nodes (via tools like Citus).
Not all clusters prioritize both—some focus solely on HA (e.g., a primary-standby setup), while others focus on scalability (e.g., read replicas or sharded clusters).
2. Key Concepts in PostgreSQL Clustering
2.1 Replication (Synchronous vs. Asynchronous)
Replication is the foundation of most PostgreSQL clusters. It copies data from a primary node (writable) to standby nodes (read-only by default).
- Asynchronous Replication: The primary commits transactions immediately and sends WAL (Write-Ahead Log) data to standbys in the background. Faster, but risk of data loss if the primary fails before WAL is replicated.
- Synchronous Replication: The primary waits for at least one standby to confirm WAL receipt before committing a transaction. Ensures zero data loss but adds latency.
Use Case: Asynchronous for read-heavy apps; synchronous for critical data (e.g., financial transactions).
2.2 Failover (Automatic vs. Manual)
Failover is the process of promoting a standby node to primary when the original primary fails.
- Manual Failover: Requires human intervention (e.g., running
pg_ctl promote). Slow and error-prone for critical systems. - Automatic Failover: Tools like Patroni or repmgr detect failures and promote a standby automatically, minimizing downtime.
2.3 Load Balancing
Load balancing distributes client traffic across cluster nodes to avoid overloading the primary. For read-heavy workloads, read replicas can handle queries, while the primary focuses on writes. Tools like pgBouncer or HAProxy route traffic to the correct nodes.
2.4 Sharding
Sharding splits large datasets across multiple nodes (shards), each storing a subset of data (e.g., by user ID or region). This enables write scalability, as each shard handles a portion of write traffic. Citus and Citus Cloud are popular tools for sharding PostgreSQL.
3. Setting Up High Availability (HA) with Patroni
Patroni is a leading open-source tool for managing PostgreSQL HA clusters. It automates replication, failover, and cluster configuration using a Distributed Configuration Store (DCS) like etcd, ZooKeeper, or Consul.
3.1 Prerequisites
- 3 nodes (physical or virtual) running Ubuntu 20.04/22.04 (or similar Linux distro).
- PostgreSQL 14+ installed on all nodes.
- Root access or
sudoprivileges. - Network connectivity between nodes (open ports: 5432 for PostgreSQL, 2379 for etcd, 8008 for Patroni API).
3.2 Step 1: Install PostgreSQL and Dependencies
On all nodes, install PostgreSQL and required packages:
# Update packages
sudo apt update && sudo apt upgrade -y
# Install PostgreSQL 14
sudo apt install postgresql-14 postgresql-contrib -y
# Install Python (for Patroni) and etcd client
sudo apt install python3 python3-pip etcd-client -y
pip3 install patroni[etcd] # Patroni with etcd support
Stop PostgreSQL on all nodes (Patroni will manage it):
sudo systemctl stop postgresql
sudo systemctl disable postgresql
3.3 Step 2: Configure etcd (Distributed Configuration Store)
etcd is a distributed key-value store that Patroni uses to track cluster state (e.g., which node is primary). We’ll set up a 3-node etcd cluster for redundancy.
On Node 1 (etcd leader):
# Install etcd
sudo apt install etcd -y
# Edit etcd config
sudo nano /etc/default/etcd
Add the following (replace NODE1_IP, NODE2_IP, NODE3_IP with your node IPs):
ETCD_NAME="node1"
ETCD_DATA_DIR="/var/lib/etcd/node1"
ETCD_LISTEN_PEER_URLS="http://NODE1_IP:2380"
ETCD_LISTEN_CLIENT_URLS="http://NODE1_IP:2379,http://127.0.0.1:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://NODE1_IP:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://NODE1_IP:2379"
ETCD_INITIAL_CLUSTER="node1=http://NODE1_IP:2380,node2=http://NODE2_IP:2380,node3=http://NODE3_IP:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-token"
ETCD_INITIAL_CLUSTER_STATE="new"
Repeat for Nodes 2 and 3, updating ETCD_NAME (e.g., “node2”) and IPs.
Start etcd on all nodes:
sudo systemctl start etcd
sudo systemctl enable etcd
Verify etcd cluster health:
etcdctl --endpoints=http://NODE1_IP:2379,http://NODE2_IP:2379,http://NODE3_IP:2379 cluster-health
3.4 Step 3: Install and Configure Patroni
On all nodes, create a Patroni configuration file (e.g., /etc/patroni.yml):
scope: postgres-cluster # Unique cluster name
namespace: /db/
name: node1 # Replace with "node2", "node3" on other nodes
restapi:
listen: NODE1_IP:8008 # Replace with node IP
connect_address: NODE1_IP:8008
etcd:
hosts: NODE1_IP:2379,NODE2_IP:2379,NODE3_IP:2379 # etcd endpoints
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
wal_level: replica
max_wal_senders: 10
max_replication_slots: 10
wal_keep_size: 16MB
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 0.0.0.0/0 md5
- host all all 0.0.0.0/0 md5
postgresql:
listen: NODE1_IP:5432 # Replace with node IP
connect_address: NODE1_IP:5432
data_dir: /var/lib/postgresql/14/main
bin_dir: /usr/lib/postgresql/14/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: secure_password # Use a strong password
superuser:
username: postgres
password: postgres_password # Use a strong password
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
3.5 Step 4: Initialize the Cluster
Start Patroni on all nodes using a systemd service. Create /etc/systemd/system/patroni.service:
[Unit]
Description=Patroni PostgreSQL cluster manager
After=network.target etcd.service
[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni.yml
Restart=always
[Install]
WantedBy=multi-user.target
Reload systemd and start Patroni:
sudo systemctl daemon-reload
sudo systemctl start patroni
sudo systemctl enable patroni
Check cluster status with patronictl (run on any node):
patronictl -c /etc/patroni.yml list
You should see one primary and two standbys:
+ Cluster: postgres-cluster (6902311560556585161) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+----------+------------+---------+---------+----+-----------+
| node1 | NODE1_IP | Leader | running | 1 | |
| node2 | NODE2_IP | Replica | running | 1 | 0 |
| node3 | NODE3_IP | Replica | running | 1 | 0 |
+----------+------------+---------+---------+----+-----------+
3.6 Step 5: Test Failover
Simulate a primary failure by stopping Patroni on the leader node (e.g., Node 1):
sudo systemctl stop patroni
Check the cluster status again. Within 10–30 seconds, Patroni will promote a standby (Node 2 or 3) to primary:
+ Cluster: postgres-cluster (6902311560556585161) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+----------+------------+---------+---------+----+-----------+
| node2 | NODE2_IP | Leader | running | 1 | |
| node3 | NODE3_IP | Replica | running | 1 | 0 |
| node1 | NODE1_IP | Replica | stopped | | |
+----------+------------+---------+---------+----+-----------+
Restart Patroni on Node 1, and it will rejoin as a standby.
4. Scaling PostgreSQL: Read Replicas and Citus
4.1 Read Replicas: Setup and Configuration
Read replicas offload read traffic from the primary, improving query performance. They use asynchronous replication by default.
Step 1: Configure the Primary Node
Edit postgresql.conf on the primary (Patroni manages this, but you can override via Patroni’s postgresql.parameters in patroni.yml):
wal_level = replica
max_wal_senders = 5 # Allow 5 replicas
wal_keep_size = 1GB # Retain WAL for replicas to catch up
Step 2: Create a Replication User
On the primary, create a user for replication:
CREATE ROLE replicator WITH REPLICATION LOGIN ENCRYPTED PASSWORD 'secure_password';
Step 3: Set Up the Replica
On the replica node, use pg_basebackup to clone the primary’s data:
sudo -u postgres pg_basebackup -h PRIMARY_IP -U replicator -D /var/lib/postgresql/14/main -P -Xs -R
-Xs: Stream WAL during backup.-R: Auto-generaterecovery.conf(orstandby.signalin PostgreSQL 12+).
Step 4: Start the Replica
sudo systemctl start postgresql
Verify replication status on the primary:
SELECT usename, application_name, state FROM pg_stat_replication;
4.2 Citus: Distributed PostgreSQL for Scalability
Citus transforms PostgreSQL into a distributed database by sharding tables across worker nodes. It’s ideal for write-heavy apps or large datasets.
Prerequisites:
- 1 coordinator node (manages metadata and query planning).
- 2+ worker nodes (store sharded data).
Step 1: Install Citus
On all nodes:
# Add Citus repo
curl https://install.citusdata.com/community/deb.sh | sudo bash
# Install Citus
sudo apt install postgresql-14-citus -y
Step 2: Configure the Coordinator
On the coordinator, edit postgresql.conf:
shared_preload_libraries = 'citus'
citus.node_conninfo = 'host=WORKER1_IP port=5432' # Add all workers
Restart PostgreSQL:
sudo systemctl restart postgresql
Step 3: Add Worker Nodes
On the coordinator, register workers:
SELECT citus_add_node('WORKER1_IP', 5432);
SELECT citus_add_node('WORKER2_IP', 5432);
Step 4: Create a Distributed Table
Shard a table by a distribution key (e.g., user_id):
-- Create a local table
CREATE TABLE orders (order_id int, user_id int, amount numeric);
-- Distribute it across workers
SELECT citus_distribute_table('orders', 'user_id');
Citus automatically routes queries to the correct worker based on user_id.
5. Monitoring and Maintenance
5.1 Monitoring Tools
- Prometheus + Grafana: Use
postgres_exporterto collect metrics (e.g., replication lag, query latency) and visualize with Grafana dashboards. - pg_stat_activity: Real-time query monitoring:
SELECT pid, datname, usename, query FROM pg_stat_activity WHERE state = 'active'; - Patroni API: Check cluster health via
http://NODE_IP:8008/health.
5.2 Backup Strategies
Even with HA, backups are critical. Use:
- pg_dump: For logical backups of specific databases/tables.
- pg_basebackup: For physical backups (supports point-in-time recovery).
- Barman: Automates backups and WAL archiving for HA clusters.
6. Best Practices for PostgreSQL Clustering
- Network Reliability: Use low-latency, high-bandwidth networks for replication.
- Resource Sizing: Ensure standbys/replicas have equivalent CPU/RAM to the primary.
- Test Failover Regularly: Simulate failures to validate automatic recovery.
- Secure Replication: Use SSL for replication traffic and restrict access with
pg_hba.conf. - Document the Cluster: Track node roles, IPs, and recovery procedures.
7. Conclusion
PostgreSQL clustering is a powerful way to achieve high availability and scalability, but it requires careful planning. For HA, tools like Patroni simplify automatic failover. For scalability, read replicas handle read traffic, while Citus enables distributed writes. By combining these strategies and following best practices, you can build a resilient, high-performance PostgreSQL infrastructure.