Table of Contents
- Understanding PostgreSQL Replication
- Types of PostgreSQL Replication
- Setting Up Streaming (Physical) Replication
- 3.1 Prerequisites
- 3.2 Configuring the Primary Server
- 3.3 Setting Up the Standby Server
- 3.4 Verifying Replication
- Setting Up Logical Replication
- 4.1 Use Cases for Logical Replication
- 4.2 Configuration Steps
- Monitoring Replication Health
- Failover and Maintenance
- Best Practices for Reliable Replication
- Conclusion
- References
Understanding PostgreSQL Replication
At its core, PostgreSQL replication is the process of synchronizing data from a primary server to one or more standby servers. Here’s how it works:
- Primary Server: The main database server that handles all write operations (INSERT, UPDATE, DELETE) and commits transactions. It generates a write-ahead log (WAL) to record changes before applying them to the database.
- Standby Server: A read-only copy of the primary. Standbys replay the WAL from the primary to stay synchronized. They can serve read traffic to offload the primary and act as a failover target if the primary fails.
The Role of WAL (Write-Ahead Logging)
PostgreSQL uses WAL to ensure data integrity. All changes are first written to the WAL before being applied to the database (the “write-ahead” part). For replication, the primary streams WAL records to standbys, which replay these records to replicate changes. This makes WAL the backbone of PostgreSQL replication.
Types of PostgreSQL Replication
PostgreSQL supports multiple replication methods, each suited to different use cases. We’ll focus on the two most widely used:
1. Physical (Streaming) Replication
Physical replication creates an exact byte-for-byte copy of the primary’s database cluster. It replicates the entire database, including schema, indexes, and data files.
Key Features:
- Synchronizes at the block level (low overhead).
- Supports failover (standby can be promoted to primary).
- Ideal for high availability and disaster recovery.
- Requires identical PostgreSQL versions on primary and standby.
Limitations:
- No selective replication (replicates the entire database).
- Schema changes on the primary may break replication if not applied to standbys first.
2. Logical Replication
Logical replication replicates data at the row level, using logical WAL records (decoded from physical WAL). It allows replicating specific tables, rows, or columns.
Key Features:
- Replicates changes as logical events (e.g., “INSERT INTO table X VALUES Y”).
- Supports selective replication (e.g., replicate only a specific table).
- Allows schema changes (e.g., adding a column) without breaking replication.
- Works across different PostgreSQL versions (minor releases).
Use Cases:
- Data sharing between databases (e.g., replicating a sales table to a reporting database).
- Migrating data between servers with different schemas.
- Offloading read traffic for specific tables.
Other Replication Methods
- File-Based Replication: Uses
archive_commandto copy WAL files to a standby (older, less efficient than streaming). - Synchronous Replication: Ensures WAL is written to at least one standby before a transaction commits (higher latency, but zero data loss).
Setting Up Streaming (Physical) Replication
Let’s walk through setting up streaming replication with two servers: a primary (primary.example.com) and a standby (standby.example.com).
3.1 Prerequisites
- Two servers (physical or virtual) running PostgreSQL 12+ (we’ll use PostgreSQL 16).
- Network connectivity: Port 5432 (PostgreSQL) open between servers.
- A
postgresOS user with SSH access (for backups). - Sufficient disk space on the standby (equal to or larger than the primary).
3.2 Configuring the Primary Server
Step 1: Update postgresql.conf
Edit the primary’s configuration file (usually in /var/lib/postgresql/16/main/postgresql.conf):
# Allow connections from the standby
listen_addresses = '*' # Or specify the standby's IP (e.g., '192.168.1.100,192.168.1.101')
# Enable replication (WAL level must be 'replica' or higher)
wal_level = replica # 'logical' if using logical replication later
max_wal_senders = 3 # Max number of standby connections
wal_keep_size = 1GB # Keep WAL files for 1GB (prevents standby from falling too far behind)
max_replication_slots = 3 # For replication slots (prevents WAL deletion)
Step 2: Allow Replication in pg_hba.conf
Edit /var/lib/postgresql/16/main/pg_hba.conf to permit the standby to connect for replication:
# Allow standby (IP: 192.168.1.101) to connect as replication user
host replication replicator 192.168.1.101/32 scram-sha-256
Replace 192.168.1.101/32 with the standby’s IP and scram-sha-256 with your authentication method (e.g., md5 for older setups).
Step 3: Restart PostgreSQL and Create a Replication User
Restart the primary to apply changes:
sudo systemctl restart postgresql@16-main
Create a dedicated replication user with the REPLICATION privilege:
-- On primary:
CREATE USER replicator WITH REPLICATION PASSWORD 'secure_password';
3.3 Setting Up the Standby Server
Step 1: Stop PostgreSQL on the Standby
If PostgreSQL is running on the standby, stop it to replace the data directory:
sudo systemctl stop postgresql@16-main
sudo rm -rf /var/lib/postgresql/16/main/* # CAUTION: Deletes existing data!
Step 2: Take a Base Backup of the Primary
Use pg_basebackup to copy the primary’s data directory to the standby. Run this on the standby:
sudo -u postgres pg_basebackup \
-h primary.example.com \
-U replicator \
-D /var/lib/postgresql/16/main \
-Fp \ # Plain format (directory)
-Xs \ # Stream WAL during backup (no need for archive)
-Pv # Progress and verbose output
Enter the replicator password when prompted. This creates an initial copy of the primary’s database on the standby.
Step 3: Configure Standby to Replicate from Primary
On the standby, create a postgresql.auto.conf file (PostgreSQL 12+; older versions use recovery.conf):
# /var/lib/postgresql/16/main/postgresql.auto.conf
standby_mode = 'on'
primary_conninfo = 'host=primary.example.com port=5432 user=replicator password=secure_password application_name=standby1'
recovery_target_timeline = 'latest' # Replay all WAL from primary
Step 4: Start the Standby
Start PostgreSQL on the standby:
sudo systemctl start postgresql@16-main
3.4 Verifying Replication
On the Primary
Check replication status with pg_stat_replication:
-- On primary:
SELECT usename, application_name, client_addr, state, sync_state FROM pg_stat_replication;
You should see a row for the standby with state=streaming and sync_state=async (default).
On the Standby
Check if the standby is replaying WAL:
-- On standby:
SELECT pg_is_in_recovery(); -- Should return 't' (true, in recovery mode)
SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag; -- Time since last replay
If replication_lag is low (seconds), replication is working!
Setting Up Logical Replication
Logical replication requires PostgreSQL 10+. We’ll replicate a sales table from the primary to the standby.
4.1 Use Cases for Logical Replication
- Replicating a subset of tables (e.g., only
salesandcustomers). - Migrating data to a new server with a modified schema.
- Sharing data with a PostgreSQL instance in the cloud.
4.2 Configuration Steps
Step 1: Prepare the Primary Server
Enable logical replication by setting wal_level=logical in postgresql.conf (requires restart):
wal_level = logical # In postgresql.conf on primary
max_replication_slots = 5 # For logical slots
max_wal_senders = 5
Restart the primary:
sudo systemctl restart postgresql@16-main
Step 2: Create a Publication on the Primary
A publication defines the set of tables to replicate. Create one for the sales table:
-- On primary:
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
product_id INT,
amount NUMERIC,
sale_date TIMESTAMP DEFAULT NOW()
);
-- Create publication for 'sales' table:
CREATE PUBLICATION sales_publication FOR TABLE sales;
Grant SELECT on the table to the replication user (required for logical replication):
GRANT SELECT ON sales TO replicator;
Step 3: Create a Subscription on the Standby
A subscription connects the standby to the primary’s publication. First, create the sales table on the standby (schema must match the primary’s):
-- On standby:
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
product_id INT,
amount NUMERIC,
sale_date TIMESTAMP DEFAULT NOW()
);
Create the subscription to sales_publication:
-- On standby:
CREATE SUBSCRIPTION sales_subscription
CONNECTION 'host=primary.example.com port=5432 user=replicator password=secure_password dbname=postgres'
PUBLICATION sales_publication;
Step 4: Test Replication
Insert a row on the primary and verify it appears on the standby:
-- On primary:
INSERT INTO sales (product_id, amount) VALUES (101, 99.99);
-- On standby:
SELECT * FROM sales; -- Should return the new row!
Monitoring Replication Health
To ensure replication remains reliable, monitor key metrics:
Key Metrics to Track
- Replication Lag: Time between when a change occurs on the primary and when it’s replayed on the standby.
- On primary:
pg_stat_replication.replay_lag(time since standby replayed the last WAL). - On standby:
now() - pg_last_xact_replay_timestamp().
- On primary:
- WAL Generation/Rate: Ensure the primary isn’t generating WAL faster than the standby can replay it.
- Replication Slot Status: Check for unused slots (can cause WAL bloat).
Tools for Monitoring
pg_stat_replication: Built-in view on the primary to track standby connections.pg_replication_slots: View slot status (e.g.,activeorinactive).- Prometheus + Grafana: Use the
postgres_exporterto scrape metrics and visualize lag, WAL rate, etc. - pgBadger: Analyze PostgreSQL logs for replication errors.
Failover and Maintenance
Manual Failover
If the primary fails, promote the standby to take over:
-
On the standby:
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/16/main -
Update your application’s database connection string to point to the new primary (standby’s IP).
-
Reconfigure the old primary as a standby for the new primary (repeat the base backup step).
Automated Failover
For production, use tools like:
- Patroni: A HA solution with automatic failover, REST API, and integration with etcd/Consul.
- repmgr: Simplifies replication setup, monitoring, and failover.
Maintenance Tips
- Update PostgreSQL: Always update standbys first, then the primary (to avoid version mismatches).
- Clean Up WAL: Use
pg_archivecleanupto delete old WAL files (setarchive_cleanup_commandon standbys). - Test Failover Regularly: Simulate primary failures to ensure standbys promote correctly (e.g.,
pg_ctl stopon primary).
Best Practices for Reliable Replication
-
Use Replication Slots: Prevent WAL deletion on the primary before standbys receive it (avoids “WAL file not found” errors).
-- On primary: Create a slot for the standby SELECT * FROM pg_create_physical_replication_slot('standby1_slot');Update the standby’s
primary_conninfoto use the slot:primary_conninfo = '... replication_slot=standby1_slot' -
Encrypt Replication Traffic: Use SSL to secure WAL streaming between primary and standby.
-
Monitor Lag Aggressively: Set alerts for lag exceeding 30 seconds (indicates a problem).
-
Avoid Overloading Standbys: Limit read traffic on standbys to prevent replay lag.
-
Document Your Setup: Record IPs, replication users, and failover steps for your team.
Conclusion
PostgreSQL replication is a cornerstone of building reliable, highly available data systems. Whether you need a failover target (streaming replication) or selective data sharing (logical replication), PostgreSQL offers flexible tools to meet your needs.
By following the steps in this guide, you can set up a robust replication setup, monitor its health, and ensure your data remains accessible even during outages. Remember: replication is not a substitute for backups—always combine it with regular backups for full disaster recovery.