cyberangles guide

How to Implement PostgreSQL’s Logical Replication

PostgreSQL, one of the most popular open-source relational databases, offers robust replication solutions to ensure data availability, scalability, and disaster recovery. Among these, **logical replication** stands out for its flexibility: unlike physical replication (which copies entire database clusters at the block level), logical replication replicates *changes* (inserts, updates, deletes) at the row level, based on a predefined "publication" of tables. This makes logical replication ideal for use cases like: - Synchronizing specific tables between databases (instead of entire clusters). - Scaling read workloads by offloading queries to subscribers. - Migrating data between different PostgreSQL versions or cloud providers. - Integrating with external systems (e.g., data warehouses) via replication slots. In this guide, we’ll walk through implementing logical replication step-by-step, from prerequisites to advanced configurations.

Table of Contents

  1. Prerequisites
  2. Setting Up the Publisher
  3. Setting Up the Subscriber
  4. Testing Replication
  5. Monitoring Replication
  6. Troubleshooting Common Issues
  7. Advanced Configurations
  8. Conclusion
  9. References

Prerequisites

Before starting, ensure the following:

  • PostgreSQL Version: Logical replication is available in PostgreSQL 10+. For best results, use PostgreSQL 12+ (includes improvements like parallel apply and better conflict handling).
  • Access: You need superuser or replication role privileges on both publisher and subscriber databases.
  • Network Connectivity: The subscriber must be able to connect to the publisher over the network (typically on port 5432).
  • Schema Compatibility: The subscriber database must have the same schema as the publisher (logical replication does not replicate DDL changes like CREATE TABLE).
  • WAL Configuration: The publisher must be configured to generate logical WAL (Write-Ahead Log) records.

Setting Up the Publisher

The “publisher” is the source database that shares its data via a “publication.” Follow these steps to configure it:

2.1 Configure PostgreSQL for Logical Replication

First, update the publisher’s postgresql.conf (usually in /var/lib/postgresql/<version>/main/ or /etc/postgresql/<version>/main/) to enable logical replication:

# Enable logical WAL (required for logical replication)
wal_level = logical  

# Maximum number of replication slots (one per subscriber)
max_replication_slots = 10  # Adjust based on expected subscribers  

# Maximum number of WAL sender processes (one per subscriber)
max_wal_senders = 10        # Should be >= max_replication_slots  

# Optional: Limit WAL retention (prevents disk bloat if subscribers lag)
wal_keep_size = 1024        # Retain 1GB of WAL (adjust as needed)

Restart PostgreSQL to apply changes:

sudo systemctl restart postgresql

Next, allow the subscriber to connect to the publisher by updating pg_hba.conf (in the same directory as postgresql.conf). Add a line for the subscriber’s IP address (replace 192.168.1.100 with your subscriber’s IP):

# TYPE  DATABASE        USER            ADDRESS                 METHOD
host    replication     replicator      192.168.1.100/32        md5  # Allows replication from subscriber

Reload PostgreSQL to apply pg_hba.conf changes without restarting:

sudo -u postgres psql -c "SELECT pg_reload_conf();"

Create a dedicated replication user on the publisher (replace replicator and secure_password with your credentials):

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

2.2 Create a Publication

A “publication” defines which tables to replicate and which operations (INSERT/UPDATE/DELETE/TRUNCATE) to include.

Example 1: Publish All Tables in a Database

To replicate all tables in the mydb database (excluding system tables):

-- Connect to the publisher database
sudo -u postgres psql -d mydb  

-- Create a publication for all tables
CREATE PUBLICATION my_publication FOR ALL TABLES;

Example 2: Publish Specific Tables

To replicate only users and orders tables:

CREATE PUBLICATION my_publication 
FOR TABLE users, orders;

Example 3: Filter Operations or Rows

  • Restrict operations: Replicate only INSERTs and UPDATEs (exclude DELETEs):
    CREATE PUBLICATION my_publication 
    FOR TABLE users, orders 
    WITH (publish = 'insert, update');
  • Row-level filtering (PostgreSQL 15+): Replicate only active users (e.g., status = 'active'):
    CREATE PUBLICATION my_publication 
    FOR TABLE users WHERE (status = 'active'), 
                orders WHERE (total > 100);  -- Replicate orders over $100

Verify the publication with:

-- List all publications
\dp+  -- In psql, or:
SELECT * FROM pg_publication;

-- Check which tables are in a publication
SELECT * FROM pg_publication_tables WHERE pubname = 'my_publication';

Setting Up the Subscriber

The “subscriber” is the target database that receives data from the publisher.

3.1 Prepare the Subscriber Schema

Logical replication does not replicate DDL changes (e.g., CREATE TABLE, ALTER TABLE). The subscriber must have an identical schema to the publisher.

To copy the schema from the publisher to the subscriber:

  1. Dump the publisher’s schema (replace mydb with your database name):
    pg_dump -s -d mydb -h publisher_ip -U replicator > schema_dump.sql
  2. Restore the schema on the subscriber:
    psql -d mydb_sub -U postgres -f schema_dump.sql

3.2 Create a Subscription

A “subscription” on the subscriber connects to the publisher’s publication and starts replicating data.

Connect to the subscriber database and run:

-- Replace placeholders with your publisher details
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_ip port=5432 dbname=mydb user=replicator password=secure_password'
PUBLICATION my_publication;

Key Subscription Options:

  • copy_data = true (default): Copies existing data from the publisher to the subscriber during setup. Use copy_data = false if the subscriber already has the data (e.g., after a restore).
  • enabled = true (default): Starts replication immediately. Set to false to pause initially.

Example with copy_data = false:

CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_ip port=5432 dbname=mydb user=replicator password=secure_password'
PUBLICATION my_publication
WITH (copy_data = false);

Verify the subscription:

-- List subscriptions
SELECT * FROM pg_subscription;

-- Check subscription status
SELECT * FROM pg_stat_subscription;

Testing Replication

To confirm replication works, perform write operations on the publisher and verify they appear on the subscriber.

Step 1: Insert Data on the Publisher

-- On publisher (mydb)
INSERT INTO users (id, name, status) VALUES (1, 'Alice', 'active');
INSERT INTO orders (id, user_id, total) VALUES (101, 1, 250);

Step 2: Verify on the Subscriber

-- On subscriber (mydb_sub)
SELECT * FROM users;  -- Should return Alice
SELECT * FROM orders; -- Should return order 101

Step 3: Test Updates and Deletes

-- On publisher: Update a row
UPDATE users SET name = 'Alice Smith' WHERE id = 1;

-- On publisher: Delete a row
DELETE FROM orders WHERE id = 101;

-- On subscriber: Verify changes
SELECT * FROM users WHERE id = 1;  -- Name should be 'Alice Smith'
SELECT * FROM orders WHERE id = 101;  -- Should return 0 rows

Monitoring Replication

Use these tools to monitor replication health and lag:

4.1 pg_stat_subscription (Subscriber-Side)

Shows subscription status, including last sync time and lag:

SELECT 
  subname AS subscription_name,
  status AS replication_status,
  last_sync,
  sync_state AS initial_sync_state,
  latest_end_lsn AS last_applied_lsn
FROM pg_stat_subscription;

4.2 pg_replication_slots (Publisher-Side)

Logical replication uses a “replication slot” on the publisher to track which WAL records the subscriber has processed. Check slot status with:

SELECT 
  slot_name,
  active,  -- Is the subscriber connected?
  wal_status AS wal_retention_status,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal_size
FROM pg_replication_slots WHERE slot_type = 'logical';

4.3 Logs

Check PostgreSQL logs (e.g., /var/log/postgresql/postgresql-<version>-main.log) for errors like:

  • Schema mismatches (e.g., “column does not exist”).
  • Connection issues (e.g., “could not connect to publisher”).

Troubleshooting Common Issues

5.1 Schema Mismatch

Problem: Subscriber is missing a column/table from the publisher.
Fix: Replicate the schema manually (e.g., using pg_dump -s from the publisher) and restart the subscription:

-- On subscriber: Drop and recreate the subscription
DROP SUBSCRIPTION my_subscription;
CREATE SUBSCRIPTION my_subscription ... ;  -- Use copy_data=false if data is already synced

5.2 Replication Slot Not Advancing

Problem: Publisher retains WAL (disk bloat) because the subscriber is offline.
Fix: If the subscriber is permanently offline, drop the slot on the publisher:

-- On publisher: Drop the slot (replace with your slot name)
SELECT pg_drop_replication_slot('my_subscription');

5.3 Permission Denied

Problem: “permission denied for relation users” during initial sync.
Fix: Ensure the replication user (replicator) has SELECT privileges on published tables:

-- On publisher: Grant SELECT to replicator
GRANT SELECT ON ALL TABLES IN SCHEMA public TO replicator;

5.4 Conflicts (Subscriber Writes)

Problem: Subscriber has conflicting data (e.g., a row updated on both publisher and subscriber).
Fix: Logical replication is asynchronous, so avoid writing to the subscriber. If a conflict occurs, skip the problematic transaction:

-- On subscriber: Skip the conflicting transaction
ALTER SUBSCRIPTION my_subscription SKIP;

Advanced Configurations

6.1 Multiple Subscribers

Add more subscribers by creating additional subscriptions on new databases:

-- On subscriber 2:
CREATE SUBSCRIPTION my_subscription2
CONNECTION 'host=publisher_ip port=5432 dbname=mydb user=replicator password=secure_password'
PUBLICATION my_publication;

6.2 Pause/Resume Replication

Pause replication for maintenance (e.g., schema updates on the subscriber):

-- Pause
ALTER SUBSCRIPTION my_subscription DISABLE;

-- Resume
ALTER SUBSCRIPTION my_subscription ENABLE;

6.3 Cascading Replication

Replicate from a subscriber to another database (requires the intermediate subscriber to enable wal_level = logical).

6.4 Column-Level Filtering (PostgreSQL 16+)

Replicate only specific columns (e.g., exclude sensitive data like password):

-- On publisher: Replicate only id and name from users
CREATE PUBLICATION my_publication FOR TABLE users (id, name);

Conclusion

Logical replication is a powerful tool for flexible data synchronization in PostgreSQL. By following this guide, you’ve learned to:

  • Configure publishers and subscribers.
  • Create publications and subscriptions.
  • Test, monitor, and troubleshoot replication.

For production use, ensure:

  • Regular schema synchronization (e.g., via CI/CD pipelines for DDL changes).
  • Monitoring of replication lag and disk usage (to avoid WAL bloat).
  • Backups of both publisher and subscriber databases.

References