cyberangles guide

PostgreSQL Clustering: How to Set Up High Availability and Scalability

PostgreSQL, often called “Postgres,” is one of the world’s most popular open-source relational databases, prized for its robustness, compliance with SQL standards, and extensibility. As applications grow, however, two critical challenges emerge: **high availability (HA)** (ensuring the database remains accessible even during failures) and **scalability** (handling increasing read/write loads). PostgreSQL clustering addresses these challenges by grouping multiple database instances into a unified system. Whether you’re running a small application or a enterprise-grade platform, clustering ensures minimal downtime and consistent performance. In this blog, we’ll demystify PostgreSQL clustering, explore key concepts, and provide step-by-step guides to setting up HA and scalability.

Table of Contents

  1. Understanding PostgreSQL Clustering
    • 1.1 What is Clustering?
    • 1.2 Why Clustering Matters: HA vs. Scalability
  2. Key Concepts in PostgreSQL Clustering
    • 2.1 Replication (Synchronous vs. Asynchronous)
    • 2.2 Failover (Automatic vs. Manual)
    • 2.3 Load Balancing
    • 2.4 Sharding
  3. Setting Up High Availability (HA) with Patroni
    • 3.1 Prerequisites
    • 3.2 Step 1: Install PostgreSQL and Dependencies
    • 3.3 Step 2: Configure etcd (Distributed Configuration Store)
    • 3.4 Step 3: Install and Configure Patroni
    • 3.5 Step 4: Initialize the Cluster
    • 3.6 Step 5: Test Failover
  4. Scaling PostgreSQL: Read Replicas and Citus
    • 4.1 Read Replicas: Setup and Configuration
    • 4.2 Citus: Distributed PostgreSQL for Scalability
  5. Monitoring and Maintenance
    • 5.1 Monitoring Tools
    • 5.2 Backup Strategies
  6. Best Practices for PostgreSQL Clustering
  7. Conclusion
  8. References

1. Understanding PostgreSQL Clustering

1.1 What is Clustering?

A PostgreSQL cluster is a group of interconnected database instances (nodes) working together to provide redundancy, fault tolerance, or increased performance. Unlike a single database instance, a cluster distributes work across nodes, ensuring no single point of failure (for HA) or分担负载 (for scalability).

1.2 Why Clustering Matters: HA vs. Scalability

Clustering solves two primary problems:

  • High Availability (HA): Minimizes downtime by automatically recovering from failures (e.g., server crashes, network outages). If the primary node fails, a standby node takes over seamlessly.
  • Scalability: Handles growth in data volume or query load. This includes:
    • Read Scalability: Offloading read traffic to replicas.
    • Write Scalability: Distributing write traffic across sharded nodes (via tools like Citus).

Not all clusters prioritize both—some focus solely on HA (e.g., a primary-standby setup), while others focus on scalability (e.g., read replicas or sharded clusters).

2. Key Concepts in PostgreSQL Clustering

2.1 Replication (Synchronous vs. Asynchronous)

Replication is the foundation of most PostgreSQL clusters. It copies data from a primary node (writable) to standby nodes (read-only by default).

  • Asynchronous Replication: The primary commits transactions immediately and sends WAL (Write-Ahead Log) data to standbys in the background. Faster, but risk of data loss if the primary fails before WAL is replicated.
  • Synchronous Replication: The primary waits for at least one standby to confirm WAL receipt before committing a transaction. Ensures zero data loss but adds latency.

Use Case: Asynchronous for read-heavy apps; synchronous for critical data (e.g., financial transactions).

2.2 Failover (Automatic vs. Manual)

Failover is the process of promoting a standby node to primary when the original primary fails.

  • Manual Failover: Requires human intervention (e.g., running pg_ctl promote). Slow and error-prone for critical systems.
  • Automatic Failover: Tools like Patroni or repmgr detect failures and promote a standby automatically, minimizing downtime.

2.3 Load Balancing

Load balancing distributes client traffic across cluster nodes to avoid overloading the primary. For read-heavy workloads, read replicas can handle queries, while the primary focuses on writes. Tools like pgBouncer or HAProxy route traffic to the correct nodes.

2.4 Sharding

Sharding splits large datasets across multiple nodes (shards), each storing a subset of data (e.g., by user ID or region). This enables write scalability, as each shard handles a portion of write traffic. Citus and Citus Cloud are popular tools for sharding PostgreSQL.

3. Setting Up High Availability (HA) with Patroni

Patroni is a leading open-source tool for managing PostgreSQL HA clusters. It automates replication, failover, and cluster configuration using a Distributed Configuration Store (DCS) like etcd, ZooKeeper, or Consul.

3.1 Prerequisites

  • 3 nodes (physical or virtual) running Ubuntu 20.04/22.04 (or similar Linux distro).
  • PostgreSQL 14+ installed on all nodes.
  • Root access or sudo privileges.
  • Network connectivity between nodes (open ports: 5432 for PostgreSQL, 2379 for etcd, 8008 for Patroni API).

3.2 Step 1: Install PostgreSQL and Dependencies

On all nodes, install PostgreSQL and required packages:

# Update packages  
sudo apt update && sudo apt upgrade -y  

# Install PostgreSQL 14  
sudo apt install postgresql-14 postgresql-contrib -y  

# Install Python (for Patroni) and etcd client  
sudo apt install python3 python3-pip etcd-client -y  
pip3 install patroni[etcd]  # Patroni with etcd support  

Stop PostgreSQL on all nodes (Patroni will manage it):

sudo systemctl stop postgresql  
sudo systemctl disable postgresql  

3.3 Step 2: Configure etcd (Distributed Configuration Store)

etcd is a distributed key-value store that Patroni uses to track cluster state (e.g., which node is primary). We’ll set up a 3-node etcd cluster for redundancy.

On Node 1 (etcd leader):

# Install etcd  
sudo apt install etcd -y  

# Edit etcd config  
sudo nano /etc/default/etcd  

Add the following (replace NODE1_IP, NODE2_IP, NODE3_IP with your node IPs):

ETCD_NAME="node1"  
ETCD_DATA_DIR="/var/lib/etcd/node1"  
ETCD_LISTEN_PEER_URLS="http://NODE1_IP:2380"  
ETCD_LISTEN_CLIENT_URLS="http://NODE1_IP:2379,http://127.0.0.1:2379"  
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://NODE1_IP:2380"  
ETCD_ADVERTISE_CLIENT_URLS="http://NODE1_IP:2379"  
ETCD_INITIAL_CLUSTER="node1=http://NODE1_IP:2380,node2=http://NODE2_IP:2380,node3=http://NODE3_IP:2380"  
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster-token"  
ETCD_INITIAL_CLUSTER_STATE="new"  

Repeat for Nodes 2 and 3, updating ETCD_NAME (e.g., “node2”) and IPs.

Start etcd on all nodes:

sudo systemctl start etcd  
sudo systemctl enable etcd  

Verify etcd cluster health:

etcdctl --endpoints=http://NODE1_IP:2379,http://NODE2_IP:2379,http://NODE3_IP:2379 cluster-health  

3.4 Step 3: Install and Configure Patroni

On all nodes, create a Patroni configuration file (e.g., /etc/patroni.yml):

scope: postgres-cluster  # Unique cluster name  
namespace: /db/  
name: node1  # Replace with "node2", "node3" on other nodes  

restapi:  
  listen: NODE1_IP:8008  # Replace with node IP  
  connect_address: NODE1_IP:8008  

etcd:  
  hosts: NODE1_IP:2379,NODE2_IP:2379,NODE3_IP:2379  # etcd endpoints  

bootstrap:  
  dcs:  
    ttl: 30  
    loop_wait: 10  
    retry_timeout: 10  
    maximum_lag_on_failover: 1048576  
    postgresql:  
      use_pg_rewind: true  
      parameters:  
        wal_level: replica  
        max_wal_senders: 10  
        max_replication_slots: 10  
        wal_keep_size: 16MB  

  initdb:  
    - encoding: UTF8  
    - data-checksums  

  pg_hba:  
    - host replication replicator 0.0.0.0/0 md5  
    - host all all 0.0.0.0/0 md5  

postgresql:  
  listen: NODE1_IP:5432  # Replace with node IP  
  connect_address: NODE1_IP:5432  
  data_dir: /var/lib/postgresql/14/main  
  bin_dir: /usr/lib/postgresql/14/bin  
  pgpass: /tmp/pgpass  
  authentication:  
    replication:  
      username: replicator  
      password: secure_password  # Use a strong password  
    superuser:  
      username: postgres  
      password: postgres_password  # Use a strong password  

tags:  
  nofailover: false  
  noloadbalance: false  
  clonefrom: false  
  nosync: false  

3.5 Step 4: Initialize the Cluster

Start Patroni on all nodes using a systemd service. Create /etc/systemd/system/patroni.service:

[Unit]  
Description=Patroni PostgreSQL cluster manager  
After=network.target etcd.service  

[Service]  
Type=simple  
User=postgres  
Group=postgres  
ExecStart=/usr/local/bin/patroni /etc/patroni.yml  
Restart=always  

[Install]  
WantedBy=multi-user.target  

Reload systemd and start Patroni:

sudo systemctl daemon-reload  
sudo systemctl start patroni  
sudo systemctl enable patroni  

Check cluster status with patronictl (run on any node):

patronictl -c /etc/patroni.yml list  

You should see one primary and two standbys:

+ Cluster: postgres-cluster (6902311560556585161) ---+----+-----------+  
| Member   | Host       | Role    | State   | TL | Lag in MB |  
+----------+------------+---------+---------+----+-----------+  
| node1    | NODE1_IP   | Leader  | running | 1  |           |  
| node2    | NODE2_IP   | Replica | running | 1  | 0         |  
| node3    | NODE3_IP   | Replica | running | 1  | 0         |  
+----------+------------+---------+---------+----+-----------+  

3.6 Step 5: Test Failover

Simulate a primary failure by stopping Patroni on the leader node (e.g., Node 1):

sudo systemctl stop patroni  

Check the cluster status again. Within 10–30 seconds, Patroni will promote a standby (Node 2 or 3) to primary:

+ Cluster: postgres-cluster (6902311560556585161) ---+----+-----------+  
| Member   | Host       | Role    | State   | TL | Lag in MB |  
+----------+------------+---------+---------+----+-----------+  
| node2    | NODE2_IP   | Leader  | running | 1  |           |  
| node3    | NODE3_IP   | Replica | running | 1  | 0         |  
| node1    | NODE1_IP   | Replica | stopped |    |           |  
+----------+------------+---------+---------+----+-----------+  

Restart Patroni on Node 1, and it will rejoin as a standby.

4. Scaling PostgreSQL: Read Replicas and Citus

4.1 Read Replicas: Setup and Configuration

Read replicas offload read traffic from the primary, improving query performance. They use asynchronous replication by default.

Step 1: Configure the Primary Node
Edit postgresql.conf on the primary (Patroni manages this, but you can override via Patroni’s postgresql.parameters in patroni.yml):

wal_level = replica  
max_wal_senders = 5  # Allow 5 replicas  
wal_keep_size = 1GB  # Retain WAL for replicas to catch up  

Step 2: Create a Replication User
On the primary, create a user for replication:

CREATE ROLE replicator WITH REPLICATION LOGIN ENCRYPTED PASSWORD 'secure_password';  

Step 3: Set Up the Replica
On the replica node, use pg_basebackup to clone the primary’s data:

sudo -u postgres pg_basebackup -h PRIMARY_IP -U replicator -D /var/lib/postgresql/14/main -P -Xs -R  
  • -Xs: Stream WAL during backup.
  • -R: Auto-generate recovery.conf (or standby.signal in PostgreSQL 12+).

Step 4: Start the Replica

sudo systemctl start postgresql  

Verify replication status on the primary:

SELECT usename, application_name, state FROM pg_stat_replication;  

4.2 Citus: Distributed PostgreSQL for Scalability

Citus transforms PostgreSQL into a distributed database by sharding tables across worker nodes. It’s ideal for write-heavy apps or large datasets.

Prerequisites:

  • 1 coordinator node (manages metadata and query planning).
  • 2+ worker nodes (store sharded data).

Step 1: Install Citus
On all nodes:

# Add Citus repo  
curl https://install.citusdata.com/community/deb.sh | sudo bash  

# Install Citus  
sudo apt install postgresql-14-citus -y  

Step 2: Configure the Coordinator
On the coordinator, edit postgresql.conf:

shared_preload_libraries = 'citus'  
citus.node_conninfo = 'host=WORKER1_IP port=5432'  # Add all workers  

Restart PostgreSQL:

sudo systemctl restart postgresql  

Step 3: Add Worker Nodes
On the coordinator, register workers:

SELECT citus_add_node('WORKER1_IP', 5432);  
SELECT citus_add_node('WORKER2_IP', 5432);  

Step 4: Create a Distributed Table
Shard a table by a distribution key (e.g., user_id):

-- Create a local table  
CREATE TABLE orders (order_id int, user_id int, amount numeric);  

-- Distribute it across workers  
SELECT citus_distribute_table('orders', 'user_id');  

Citus automatically routes queries to the correct worker based on user_id.

5. Monitoring and Maintenance

5.1 Monitoring Tools

  • Prometheus + Grafana: Use postgres_exporter to collect metrics (e.g., replication lag, query latency) and visualize with Grafana dashboards.
  • pg_stat_activity: Real-time query monitoring:
    SELECT pid, datname, usename, query FROM pg_stat_activity WHERE state = 'active';  
  • Patroni API: Check cluster health via http://NODE_IP:8008/health.

5.2 Backup Strategies

Even with HA, backups are critical. Use:

  • pg_dump: For logical backups of specific databases/tables.
  • pg_basebackup: For physical backups (supports point-in-time recovery).
  • Barman: Automates backups and WAL archiving for HA clusters.

6. Best Practices for PostgreSQL Clustering

  • Network Reliability: Use low-latency, high-bandwidth networks for replication.
  • Resource Sizing: Ensure standbys/replicas have equivalent CPU/RAM to the primary.
  • Test Failover Regularly: Simulate failures to validate automatic recovery.
  • Secure Replication: Use SSL for replication traffic and restrict access with pg_hba.conf.
  • Document the Cluster: Track node roles, IPs, and recovery procedures.

7. Conclusion

PostgreSQL clustering is a powerful way to achieve high availability and scalability, but it requires careful planning. For HA, tools like Patroni simplify automatic failover. For scalability, read replicas handle read traffic, while Citus enables distributed writes. By combining these strategies and following best practices, you can build a resilient, high-performance PostgreSQL infrastructure.

8. References