cyberangles guide

Troubleshooting PostgreSQL: Common Issues and Solutions

PostgreSQL, often called “Postgres,” is a powerful, open-source relational database management system (RDBMS) renowned for its robustness, scalability, and compliance with SQL standards. It’s widely used in production environments, from small applications to enterprise-grade systems. However, like any complex software, PostgreSQL can encounter issues that disrupt performance, connectivity, or data integrity. Troubleshooting PostgreSQL requires a systematic approach: starting with logs, verifying configurations, and isolating symptoms to root causes. This blog aims to demystify common PostgreSQL problems, their underlying causes, and step-by-step solutions. Whether you’re a developer managing a small database or a DBA overseeing a large cluster, this guide will help you resolve issues efficiently.

Table of Contents

  1. Connection Issues
  2. Performance Bottlenecks
  3. Replication Failures
  4. Data Corruption
  5. Backup and Restore Problems
  6. Permission Errors
  7. Version-Specific or Upgrade Issues
  8. Conclusion
  9. References

1. Connection Issues

Unable to connect to a PostgreSQL database is one of the most common problems. Symptoms include errors like could not connect to server: Connection refused or password authentication failed.

Causes

  • PostgreSQL Service Not Running: The PostgreSQL daemon (postgresql) may not be active.
  • Incorrect Credentials: Invalid username, password, or database name.
  • Firewall Blocking: The default PostgreSQL port (5432) is blocked by the OS firewall or network rules.
  • Misconfigured pg_hba.conf: The host-based authentication file (pg_hba.conf) may restrict access for the user/IP.
  • PostgreSQL Not Listening on All Interfaces: By default, PostgreSQL may bind only to localhost (check listen_addresses in postgresql.conf).

Solutions

1. Verify PostgreSQL is Running

Check the status of the PostgreSQL service:

# For systemd-based systems (Debian/Ubuntu, RHEL 7+)  
sudo systemctl status postgresql  

# For SysVinit (older systems)  
sudo service postgresql status  

If inactive, start it with:

sudo systemctl start postgresql  

2. Test Credentials and Connection

Use psql to test connectivity directly:

psql -U <username> -d <database_name> -h <host> -p <port>  
  • Replace <username>, <database_name>, <host> (e.g., localhost), and <port> (default 5432).
  • If you get password authentication failed, reset the user’s password:
    ALTER USER <username> WITH PASSWORD '<new_password>';  

3. Check Firewall Rules

Ensure port 5432 is open:

# For UFW (Debian/Ubuntu)  
sudo ufw allow 5432/tcp  

# For firewalld (RHEL/CentOS)  
sudo firewall-cmd --add-port=5432/tcp --permanent  
sudo firewall-cmd --reload  

4. Validate pg_hba.conf and postgresql.conf

  • pg_hba.conf (location: /etc/postgresql/<version>/main/pg_hba.conf on Debian/Ubuntu):
    Ensure there’s a rule allowing your user/IP. Example for local connections:

    # TYPE  DATABASE        USER            ADDRESS                 METHOD  
    local   all             all                                     md5  # Password auth for local users  
    host    all             all             192.168.1.0/24          md5  # Allow 192.168.1.x subnet  

    Reload configurations after edits:

    sudo systemctl reload postgresql  
  • postgresql.conf: Ensure listen_addresses is set to '*' (listen on all interfaces) or specific IPs:

    listen_addresses = '*'  # Default is 'localhost'  

2. Performance Bottlenecks

Slow query execution, high CPU/memory usage, or lagging transactions are signs of performance issues.

Causes

  • Unoptimized Queries: Missing indexes, full table scans, or inefficient joins.
  • Resource Constraints: Insufficient CPU, memory (e.g., shared_buffers too small), or I/O bottlenecks.
  • Locking/Blocking: Long-running transactions blocking others.
  • Poor Configuration: Suboptimal PostgreSQL settings (e.g., work_mem, maintenance_work_mem).

Solutions

1. Identify Slow Queries

Use pg_stat_statements (enable in postgresql.conf by adding pg_stat_statements to shared_preload_libraries):

-- Top 10 slowest queries by total time  
SELECT queryid, query, total_time, calls  
FROM pg_stat_statements  
ORDER BY total_time DESC  
LIMIT 10;  

Analyze a specific query with EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;  
  • Look for Seq Scan (full table scan) instead of Index Scan—indicates missing indexes.

2. Add Indexes

Create indexes for frequently filtered/joined columns:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);  
  • Avoid over-indexing (slows down writes like INSERT/UPDATE).

3. Tune PostgreSQL Configuration

Use tools like pgTune to generate optimal settings based on your server resources. Key parameters:

  • shared_buffers: Typically 25% of system memory (e.g., 4GB on 16GB RAM).
  • work_mem: Memory per sort/join operation (increase for complex queries, but avoid OOM).
  • maintenance_work_mem: Memory for index creation/VACUUM (set to 10-20% of RAM).

Edit postgresql.conf, then reload:

sudo systemctl reload postgresql  

4. Resolve Locking/Blocking

Identify blocked queries with pg_locks:

SELECT blocked_locks.pid     AS blocked_pid,  
       blocking_locks.pid    AS blocking_pid,  
       blocked_activity.usename  AS blocked_user,  
       blocking_activity.usename AS blocking_user,  
       blocked_activity.query    AS blocked_query,  
       blocking_activity.query   AS blocking_query  
FROM  pg_catalog.pg_locks         blocked_locks  
JOIN pg_catalog.pg_stat_activity blocked_activity  ON blocked_activity.pid = blocked_locks.pid  
JOIN pg_catalog.pg_locks         blocking_locks  
    ON blocking_locks.locktype = blocked_locks.locktype  
    AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE  
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation  
    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page  
    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple  
    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid  
    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid  
    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid  
    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid  
    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid  
    AND blocking_locks.pid != blocked_locks.pid  
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid  
WHERE NOT blocked_locks.granted;  
  • Kill the blocking query with SELECT pg_terminate_backend(<blocking_pid>);.

5. Monitor Resource Usage

Check CPU, memory, and I/O with system tools:

top  # CPU/memory  
iostat -x 5  # I/O stats (run for 5 seconds)  
  • High I/O wait (%iowait) may indicate slow disks—consider faster storage (SSD) or partitioning.

3. Replication Failures

PostgreSQL replication (primary-standby) ensures high availability, but lag or sync issues are common.

Causes

  • Network Latency: Slow/unstable network between primary and standby.
  • WAL Backlog: Standby can’t keep up with Write-Ahead Log (WAL) generation on the primary.
  • Configuration Mismatch: Incorrect recovery.conf (or postgresql.conf for PostgreSQL 12+) settings.
  • Insufficient Resources on Standby: Low I/O or memory causing slow WAL replay.

Solutions

1. Check Replication Status

On the primary, query pg_stat_replication:

SELECT usename, application_name, client_addr, state, sync_state, replay_lag  
FROM pg_stat_replication;  
  • state = 'streaming' indicates healthy replication.
  • replay_lag shows delay (e.g., 00:00:02 is 2 seconds).

2. Resolve WAL Backlog

  • Increase WAL Sender/Receiver Buffers: In postgresql.conf on the primary:
    wal_sender_timeout = 60s  # Restart unresponsive senders  
    max_wal_senders = 10      # Allow more standby connections  
  • On the standby, ensure wal_receiver_status_interval is set to monitor progress:
    wal_receiver_status_interval = 10s  

3. Verify Network Connectivity

Test latency between primary and standby:

ping <standby_ip>  # Check round-trip time  
traceroute <standby_ip>  # Identify network hops  
  • Use pg_basebackup to reinitialize replication if sync is broken:
    # On standby: Stop PostgreSQL, delete data directory, and rebase  
    sudo systemctl stop postgresql  
    sudo rm -rf /var/lib/postgresql/<version>/main/*  
    pg_basebackup -h <primary_ip> -U replicator -D /var/lib/postgresql/<version>/main -P -Xs -R  
    sudo systemctl start postgresql  

4. Data Corruption

Rare but critical—symptoms include invalid page in block errors or crashes during queries.

Causes

  • Hardware Failure: Bad disks, memory errors, or power outages.
  • Software Bugs: PostgreSQL bugs (rare in stable versions) or faulty extensions.
  • Filesystem Corruption: Damaged OS filesystem affecting PostgreSQL data files.

Solutions

1. Check for Corruption with pg_checksums

Enable checksums at database creation (via initdb --data-checksums). To verify:

pg_checksums -c -D /var/lib/postgresql/<version>/main  
  • Note: Checksums must be enabled during initdb; they can’t be added later.

2. Restore from Backup

If corruption is confirmed, restore from the latest backup:

pg_restore -U <user> -d <database> /path/to/backup.dump  

3. Use pg_resetwal (Last Resort)

For WAL corruption, reset the WAL log (risky—may lose data):

pg_resetwal -f /var/lib/postgresql/<version>/main  
  • Only use if backups are unavailable and corruption is isolated to WAL.

5. Backup and Restore Problems

Backups failing or restores producing errors are common pain points.

Causes

  • Insufficient Disk Space: Backup/restore target has no free space.
  • Permission Errors: pg_dump/pg_restore lacks read/write access.
  • Incompatible Versions: Backing up from PostgreSQL 14 and restoring to 12 (use pg_dump from the target version).

Solutions

1. Check Disk Space

df -h /path/to/backup/directory  

2. Verify Backup Integrity

Use pg_dump with --check to validate:

pg_dump -U <user> -d <database> --check -f /dev/null  

3. Ensure Version Compatibility

Always use pg_dump from the target PostgreSQL version when restoring to an older version:

# Example: Restore from 14 to 12  
/path/to/postgresql-12/bin/pg_dump -U user -d db > backup.sql  
psql -U user -d db -f backup.sql  # On PostgreSQL 12  

6. Permission Errors

Users unable to access tables/functions, e.g., permission denied for table.

Causes

  • Missing Privileges: User lacks SELECT, INSERT, or EXECUTE permissions.
  • Incorrect Object Ownership: Table owned by a different role than the user.

Solutions

1. Check Current Permissions

-- List permissions for a table  
\dp <table_name>  

-- List all roles and privileges  
\du  

2. Grant Privileges

-- Grant SELECT on a table to a user  
GRANT SELECT ON <table_name> TO <user>;  

-- Grant all privileges on a schema  
GRANT ALL PRIVILEGES ON SCHEMA public TO <user>;  

-- Make a user the owner of a table  
ALTER TABLE <table_name> OWNER TO <user>;  

7. Version-Specific Issues

Upgrading PostgreSQL or using deprecated features can cause unexpected errors.

Causes

  • Deprecated Features: Using syntax/functions removed in newer versions (e.g., pg_catalog.pg_stat_user_tables renamed in 12+).
  • Incompatible Extensions: Extensions (e.g., pg_stat_statements) not updated for the new version.

Solutions

  • Read Release Notes: Check PostgreSQL Release Notes for breaking changes.
  • Test Upgrades in Staging: Use pg_upgrade with --check first:
    pg_upgrade --old-datadir /path/to/old/data --new-datadir /path/to/new/data --old-bindir /path/to/old/bin --new-bindir /path/to/new/bin --check  
  • Update Extensions: After upgrade, run ALTER EXTENSION <extension> UPDATE;.

Conclusion

Troubleshooting PostgreSQL requires a mix of log analysis, systematic testing, and familiarity with common pitfalls. Start by checking logs (/var/log/postgresql/), validate configurations, and use built-in tools like pg_stat_statements or pg_stat_replication. Regular maintenance—index optimization, backups, and updates—will prevent many issues.

Remember: Logs are your best friend. Most problems leave traces there, so always start by reviewing them!

References