Table of Contents
- Connection Issues
- Performance Bottlenecks
- Replication Failures
- Data Corruption
- Backup and Restore Problems
- Permission Errors
- Version-Specific or Upgrade Issues
- Conclusion
- References
1. Connection Issues
Unable to connect to a PostgreSQL database is one of the most common problems. Symptoms include errors like could not connect to server: Connection refused or password authentication failed.
Causes
- PostgreSQL Service Not Running: The PostgreSQL daemon (
postgresql) may not be active. - Incorrect Credentials: Invalid username, password, or database name.
- Firewall Blocking: The default PostgreSQL port (5432) is blocked by the OS firewall or network rules.
- Misconfigured
pg_hba.conf: The host-based authentication file (pg_hba.conf) may restrict access for the user/IP. - PostgreSQL Not Listening on All Interfaces: By default, PostgreSQL may bind only to
localhost(checklisten_addressesinpostgresql.conf).
Solutions
1. Verify PostgreSQL is Running
Check the status of the PostgreSQL service:
# For systemd-based systems (Debian/Ubuntu, RHEL 7+)
sudo systemctl status postgresql
# For SysVinit (older systems)
sudo service postgresql status
If inactive, start it with:
sudo systemctl start postgresql
2. Test Credentials and Connection
Use psql to test connectivity directly:
psql -U <username> -d <database_name> -h <host> -p <port>
- Replace
<username>,<database_name>,<host>(e.g.,localhost), and<port>(default 5432). - If you get
password authentication failed, reset the user’s password:ALTER USER <username> WITH PASSWORD '<new_password>';
3. Check Firewall Rules
Ensure port 5432 is open:
# For UFW (Debian/Ubuntu)
sudo ufw allow 5432/tcp
# For firewalld (RHEL/CentOS)
sudo firewall-cmd --add-port=5432/tcp --permanent
sudo firewall-cmd --reload
4. Validate pg_hba.conf and postgresql.conf
-
pg_hba.conf(location:/etc/postgresql/<version>/main/pg_hba.confon Debian/Ubuntu):
Ensure there’s a rule allowing your user/IP. Example for local connections:# TYPE DATABASE USER ADDRESS METHOD local all all md5 # Password auth for local users host all all 192.168.1.0/24 md5 # Allow 192.168.1.x subnetReload configurations after edits:
sudo systemctl reload postgresql -
postgresql.conf: Ensurelisten_addressesis set to'*'(listen on all interfaces) or specific IPs:listen_addresses = '*' # Default is 'localhost'
2. Performance Bottlenecks
Slow query execution, high CPU/memory usage, or lagging transactions are signs of performance issues.
Causes
- Unoptimized Queries: Missing indexes, full table scans, or inefficient joins.
- Resource Constraints: Insufficient CPU, memory (e.g.,
shared_bufferstoo small), or I/O bottlenecks. - Locking/Blocking: Long-running transactions blocking others.
- Poor Configuration: Suboptimal PostgreSQL settings (e.g.,
work_mem,maintenance_work_mem).
Solutions
1. Identify Slow Queries
Use pg_stat_statements (enable in postgresql.conf by adding pg_stat_statements to shared_preload_libraries):
-- Top 10 slowest queries by total time
SELECT queryid, query, total_time, calls
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;
Analyze a specific query with EXPLAIN ANALYZE:
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
- Look for
Seq Scan(full table scan) instead ofIndex Scan—indicates missing indexes.
2. Add Indexes
Create indexes for frequently filtered/joined columns:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
- Avoid over-indexing (slows down writes like
INSERT/UPDATE).
3. Tune PostgreSQL Configuration
Use tools like pgTune to generate optimal settings based on your server resources. Key parameters:
shared_buffers: Typically 25% of system memory (e.g.,4GBon 16GB RAM).work_mem: Memory per sort/join operation (increase for complex queries, but avoid OOM).maintenance_work_mem: Memory for index creation/VACUUM (set to 10-20% of RAM).
Edit postgresql.conf, then reload:
sudo systemctl reload postgresql
4. Resolve Locking/Blocking
Identify blocked queries with pg_locks:
SELECT blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.usename AS blocked_user,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_query,
blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
- Kill the blocking query with
SELECT pg_terminate_backend(<blocking_pid>);.
5. Monitor Resource Usage
Check CPU, memory, and I/O with system tools:
top # CPU/memory
iostat -x 5 # I/O stats (run for 5 seconds)
- High I/O wait (%iowait) may indicate slow disks—consider faster storage (SSD) or partitioning.
3. Replication Failures
PostgreSQL replication (primary-standby) ensures high availability, but lag or sync issues are common.
Causes
- Network Latency: Slow/unstable network between primary and standby.
- WAL Backlog: Standby can’t keep up with Write-Ahead Log (WAL) generation on the primary.
- Configuration Mismatch: Incorrect
recovery.conf(orpostgresql.conffor PostgreSQL 12+) settings. - Insufficient Resources on Standby: Low I/O or memory causing slow WAL replay.
Solutions
1. Check Replication Status
On the primary, query pg_stat_replication:
SELECT usename, application_name, client_addr, state, sync_state, replay_lag
FROM pg_stat_replication;
state = 'streaming'indicates healthy replication.replay_lagshows delay (e.g.,00:00:02is 2 seconds).
2. Resolve WAL Backlog
- Increase WAL Sender/Receiver Buffers: In
postgresql.confon the primary:wal_sender_timeout = 60s # Restart unresponsive senders max_wal_senders = 10 # Allow more standby connections - On the standby, ensure
wal_receiver_status_intervalis set to monitor progress:wal_receiver_status_interval = 10s
3. Verify Network Connectivity
Test latency between primary and standby:
ping <standby_ip> # Check round-trip time
traceroute <standby_ip> # Identify network hops
- Use
pg_basebackupto reinitialize replication if sync is broken:# On standby: Stop PostgreSQL, delete data directory, and rebase sudo systemctl stop postgresql sudo rm -rf /var/lib/postgresql/<version>/main/* pg_basebackup -h <primary_ip> -U replicator -D /var/lib/postgresql/<version>/main -P -Xs -R sudo systemctl start postgresql
4. Data Corruption
Rare but critical—symptoms include invalid page in block errors or crashes during queries.
Causes
- Hardware Failure: Bad disks, memory errors, or power outages.
- Software Bugs: PostgreSQL bugs (rare in stable versions) or faulty extensions.
- Filesystem Corruption: Damaged OS filesystem affecting PostgreSQL data files.
Solutions
1. Check for Corruption with pg_checksums
Enable checksums at database creation (via initdb --data-checksums). To verify:
pg_checksums -c -D /var/lib/postgresql/<version>/main
- Note: Checksums must be enabled during
initdb; they can’t be added later.
2. Restore from Backup
If corruption is confirmed, restore from the latest backup:
pg_restore -U <user> -d <database> /path/to/backup.dump
3. Use pg_resetwal (Last Resort)
For WAL corruption, reset the WAL log (risky—may lose data):
pg_resetwal -f /var/lib/postgresql/<version>/main
- Only use if backups are unavailable and corruption is isolated to WAL.
5. Backup and Restore Problems
Backups failing or restores producing errors are common pain points.
Causes
- Insufficient Disk Space: Backup/restore target has no free space.
- Permission Errors:
pg_dump/pg_restorelacks read/write access. - Incompatible Versions: Backing up from PostgreSQL 14 and restoring to 12 (use
pg_dumpfrom the target version).
Solutions
1. Check Disk Space
df -h /path/to/backup/directory
2. Verify Backup Integrity
Use pg_dump with --check to validate:
pg_dump -U <user> -d <database> --check -f /dev/null
3. Ensure Version Compatibility
Always use pg_dump from the target PostgreSQL version when restoring to an older version:
# Example: Restore from 14 to 12
/path/to/postgresql-12/bin/pg_dump -U user -d db > backup.sql
psql -U user -d db -f backup.sql # On PostgreSQL 12
6. Permission Errors
Users unable to access tables/functions, e.g., permission denied for table.
Causes
- Missing Privileges: User lacks
SELECT,INSERT, orEXECUTEpermissions. - Incorrect Object Ownership: Table owned by a different role than the user.
Solutions
1. Check Current Permissions
-- List permissions for a table
\dp <table_name>
-- List all roles and privileges
\du
2. Grant Privileges
-- Grant SELECT on a table to a user
GRANT SELECT ON <table_name> TO <user>;
-- Grant all privileges on a schema
GRANT ALL PRIVILEGES ON SCHEMA public TO <user>;
-- Make a user the owner of a table
ALTER TABLE <table_name> OWNER TO <user>;
7. Version-Specific Issues
Upgrading PostgreSQL or using deprecated features can cause unexpected errors.
Causes
- Deprecated Features: Using syntax/functions removed in newer versions (e.g.,
pg_catalog.pg_stat_user_tablesrenamed in 12+). - Incompatible Extensions: Extensions (e.g.,
pg_stat_statements) not updated for the new version.
Solutions
- Read Release Notes: Check PostgreSQL Release Notes for breaking changes.
- Test Upgrades in Staging: Use
pg_upgradewith--checkfirst:pg_upgrade --old-datadir /path/to/old/data --new-datadir /path/to/new/data --old-bindir /path/to/old/bin --new-bindir /path/to/new/bin --check - Update Extensions: After upgrade, run
ALTER EXTENSION <extension> UPDATE;.
Conclusion
Troubleshooting PostgreSQL requires a mix of log analysis, systematic testing, and familiarity with common pitfalls. Start by checking logs (/var/log/postgresql/), validate configurations, and use built-in tools like pg_stat_statements or pg_stat_replication. Regular maintenance—index optimization, backups, and updates—will prevent many issues.
Remember: Logs are your best friend. Most problems leave traces there, so always start by reviewing them!
References
- PostgreSQL Official Documentation
- pgTune – Configuration Tuning Tool
- pg_stat_statements Documentation
- PostgreSQL Replication Guide
- Stack Overflow PostgreSQL Tag – Community Support