Table of Contents
- What Are Indexes in PostgreSQL?
- Common Index Types in PostgreSQL
- How Indexes Work Internally
- Best Practices for Effective Indexing
- Common Indexing Mistakes to Avoid
- Index Maintenance: Keeping Indexes Healthy
- Advanced Indexing Techniques
- Conclusion
- References
What Are Indexes in PostgreSQL?
An index is a data structure that stores a subset of a table’s data in a way that accelerates data retrieval. Think of it like the index of a book: instead of flipping through every page to find a topic, you look up the page number in the index.
In PostgreSQL, indexes are stored separately from the main table (the “heap”) and contain pointers to the actual rows (heap tuples) that match the indexed values. When you run a query with a WHERE clause, PostgreSQL can use an index to quickly locate the relevant rows instead of scanning the entire table (a “sequential scan”).
Key Trade-Offs
- Pros: Faster read queries (SELECT, JOIN, WHERE clauses).
- Cons: Slower write operations (INSERT, UPDATE, DELETE) because PostgreSQL must update all related indexes. Indexes also consume additional storage and memory.
Common Index Types in PostgreSQL
PostgreSQL supports multiple index types, each optimized for specific query patterns. Choosing the right type is critical for performance.
1. B-Tree Indexes (Default)
The most common index type, optimized for equality (=, <=>) and range queries (<, >, BETWEEN, IN). B-trees work best with ordered data (e.g., numbers, dates, strings) and high-cardinality columns (many distinct values).
Use case: Indexing user_id, email, or order_date columns.
Example:
CREATE INDEX idx_orders_order_date ON orders (order_date);
2. Hash Indexes
Designed for equality-only queries (=), but not range queries. Hash indexes are rarely used in practice because:
- They don’t support ordering or range scans.
- B-trees are often faster even for equality queries (due to better caching and optimization).
Example:
CREATE INDEX idx_users_email_hash ON users USING hash (email); -- Use only for simple = queries
3. GIN (Generalized Inverted Indexes)
Optimized for multivalued data (arrays, JSONB, full-text search) and columns with multiple values per row. GIN indexes invert the mapping: instead of mapping keys to rows, they map row elements to keys.
Use case: Indexing tags[] arrays, jsonb fields, or full-text search vectors.
Example:
-- Index an array of tags
CREATE INDEX idx_articles_tags_gin ON articles USING gin (tags);
-- Full-text search index
CREATE INDEX idx_articles_body_fts ON articles USING gin (to_tsvector('english', body));
4. GiST (Generalized Search Tree)
Used for geospatial data (PostGIS), text search, and custom data types requiring proximity or ordering. GiST is more flexible than GIN but slower to build and query for large datasets.
Use case: Indexing geometry columns (e.g., POINT, POLYGON).
Example:
CREATE INDEX idx_locations_geom_gist ON locations USING gist (geom);
5. BRIN (Block Range Indexes)
Designed for very large tables with naturally ordered data (e.g., time-series data, logs). BRIN indexes store summary information for “block ranges” (groups of table pages) instead of individual rows, making them tiny and fast to maintain.
Use case: Indexing timestamp columns in tables with billions of rows (e.g., sensor data).
Example:
CREATE INDEX idx_sensor_data_timestamp_brin ON sensor_data USING brin (reading_time);
Other Types
- SP-GiST: For non-balanced tree structures (e.g., quadtrees for geospatial data).
- Covering Indexes: Use
INCLUDEto add non-key columns and avoid table lookups (see Composite Indexes).
How Indexes Work Internally
To use indexes effectively, it helps to understand their structure. Let’s focus on B-trees, the most widely used type.
B-Tree Structure
A B-tree is a balanced tree where:
- Leaf nodes contain index keys and pointers (CTIDs) to the actual table rows (heap tuples).
- Internal nodes guide the search: PostgreSQL traverses from the root to the leaf nodes to find the relevant keys.

Simplified B-tree: Internal nodes direct the search; leaf nodes point to table rows.
Index Scans vs. Sequential Scans
PostgreSQL’s query planner chooses between:
- Sequential Scan: Reads the entire table (fast for small tables or unselective queries).
- Index Scan: Uses the index to find rows (faster for large tables with selective filters).
The planner decides based on:
- Table size and index size.
- Selectivity (fraction of rows matching the query).
- Statistics (from
ANALYZE) about data distribution.
Use EXPLAIN ANALYZE to see which scan type PostgreSQL uses:
EXPLAIN ANALYZE SELECT * FROM orders WHERE order_date > '2023-01-01';
-- Look for "Index Scan using idx_orders_order_date" (good) vs. "Seq Scan" (bad for large tables)
Best Practices for Effective Indexing
1. Identify Queries That Need Indexes
Not all queries benefit from indexes. Focus on frequently run, slow queries (e.g., from application logs or pg_stat_statements).
Tools to Identify Slow Queries
EXPLAIN ANALYZE: Shows the query execution plan and actual runtime. Look forSeq Scanon large tables.EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';- Slow Query Log: Enable
log_min_duration_statementinpostgresql.confto log queries slower than a threshold (e.g., 100ms). pg_stat_statements: Tracks execution statistics for all SQL statements.-- Top 10 slowest queries by total time SELECT queryid, query, total_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
2. Choose the Right Index Type
Match the index type to your query pattern:
| Query Type | Index Type | Example Query |
|---|---|---|
| Equality/range queries | B-tree | WHERE order_date > '2023-01-01' |
| Arrays/JSONB/full-text | GIN | WHERE tags @> ARRAY['postgres'] |
| Geospatial data | GiST (PostGIS) | WHERE ST_Contains(geom, ST_MakePoint(...)) |
| Large ordered tables | BRIN | WHERE timestamp > '2023-01-01' (100M+ rows) |
3. Index Selective Columns
Index columns with high selectivity (many distinct values). A column like user_id (cardinality = number of users) is highly selective, while is_active (cardinality = 2) is not.
Why? Indexes on low-selectivity columns rarely speed up queries. For example, an index on is_active (which is 90% true) won’t help PostgreSQL find the 10% false rows faster than a sequential scan.
4. Optimize Composite Indexes
A composite index includes multiple columns (e.g., (user_id, order_date)). To maximize utility:
a. Order Matters: Leftmost Prefix Rule
PostgreSQL can only use a composite index if the query filters on the leftmost columns. For example, an index (a, b, c) works for:
WHERE a = 5WHERE a = 5 AND b = 10WHERE a = 5 AND b = 10 AND c = 15
But not for WHERE b = 10 or WHERE b = 10 AND c = 15 (missing a).
b. Put Selective Columns First
Order columns by selectivity (most selective first). For example, if user_id is more selective than order_date, create (user_id, order_date) instead of (order_date, user_id).
c. Use Covering Indexes with INCLUDE
A covering index includes non-key columns to avoid “bookmark lookups” (fetching data from the table heap). Use the INCLUDE clause to add these columns without affecting index ordering.
Example:
-- Index (user_id, order_date) and include total_amount to avoid table lookups
CREATE INDEX idx_orders_user_date_covering ON orders (user_id, order_date) INCLUDE (total_amount);
-- Query can now use the index directly (no need to access the table)
SELECT total_amount FROM orders WHERE user_id = 123 AND order_date > '2023-01-01';
5. Avoid Over-Indexing
Every index adds overhead to writes. A table with 10 indexes will slow down INSERT by 10x (since each index must be updated).
How to Find Unused Indexes
Use pg_stat_user_indexes to identify indexes that are never scanned:
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0; -- Indexes with 0 scans are candidates for deletion
Action: Drop unused indexes to speed up writes.
6. Leverage Partial and Expression Indexes
Partial Indexes
Index only a subset of rows (e.g., active orders, recent data). Partial indexes are smaller, faster to maintain, and avoid indexing irrelevant data.
Example: Index only active orders (90% of queries filter for status = 'active'):
CREATE INDEX idx_orders_active ON orders (order_date) WHERE status = 'active';
Expression Indexes
Index the result of a function or expression (e.g., lowercase emails, formatted dates). Use when queries filter on transformed values.
Example: Index lowercase emails to speed up case-insensitive searches:
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- Query uses the expression index
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
7. Index Foreign Keys
Foreign key columns are often used in JOIN clauses (e.g., orders.user_id JOIN users.id). Indexing foreign keys speeds up joins and ensures referential integrity checks (e.g., DELETE on users doesn’t require a sequential scan of orders).
Example:
-- Always index foreign keys!
CREATE INDEX idx_orders_user_id ON orders (user_id); -- Foreign key to users.id
8. Tune Index Storage Parameters
Adjust storage parameters to optimize index performance for your workload:
Fillfactor
The fillfactor determines how much of each index page is filled during creation. For tables with frequent updates, set a lower fillfactor (e.g., 70) to leave space for new entries, reducing index bloat.
Example:
-- For tables with heavy updates (e.g., user profiles)
CREATE INDEX idx_users_last_login ON users (last_login) WITH (fillfactor = 70);
Common Indexing Mistakes to Avoid
- Indexing Every Column
“Just in case” indexing slows down writes and wastes storage. Only index columns used in WHERE, JOIN, or ORDER BY clauses.
- Ignoring Composite Index Order
Creating (b, a) instead of (a, b) can render the index useless if queries filter on a.
- Overlooking Partial/Expression Indexes
Using a full index when a partial/expression index would be smaller and faster (e.g., indexing all email instead of LOWER(email)).
- Forgetting to Analyze Tables
PostgreSQL relies on up-to-date statistics (from ANALYZE) to choose the best index. Outdated stats can lead to poor query plans (e.g., choosing a sequential scan when an index exists).
Fix: Run VACUUM ANALYZE regularly, or enable autovacuum (default in PostgreSQL).
Index Maintenance: Keeping Indexes Healthy
Over time, indexes become bloated due to updates and deletes (PostgreSQL marks old index entries as dead but doesn’t immediately remove them). Bloated indexes slow down queries and waste space.
How to Detect Bloat
Use pgstattuple to check index bloat:
-- Install the extension (once per database)
CREATE EXTENSION pgstattuple;
-- Check bloat for an index
SELECT * FROM pgstattuple('idx_orders_order_date');
-- Look for "freespace" percentage (high = bloated)
How to Fix Bloat
REINDEX: Rebuilds the index from scratch. UseCONCURRENTLYto avoid locking the table:REINDEX INDEX CONCURRENTLY idx_orders_order_date; -- No table lock, slower but safepg_repack: A third-party tool that rebuilds indexes without downtime (better for large tables).
Advanced Indexing Techniques
- Indexing JSONB Fields
For JSONB columns, use GIN indexes to query nested values:
-- Index a JSONB field's "metadata.tags" array
CREATE INDEX idx_products_metadata_tags ON products USING gin ((metadata->'tags'));
-- Query using the index
SELECT * FROM products WHERE metadata->'tags' @> '["electronics"]';
- Full-Text Search with GIN
Optimize text search with GIN indexes on tsvector columns:
-- Precompute and index the full-text vector
ALTER TABLE articles ADD COLUMN body_fts tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED;
CREATE INDEX idx_articles_body_fts_gin ON articles USING gin (body_fts);
-- Fast search
SELECT * FROM articles WHERE body_fts @@ to_tsquery('english', 'postgres & indexing');
Conclusion
Indexing is both an art and a science. By understanding PostgreSQL’s index types, analyzing query patterns, and following best practices like avoiding over-indexing and using partial/expression indexes, you can build a database that scales efficiently.
Remember: indexes are not a one-time setup. Regularly monitor index usage, maintain bloat, and adjust based on evolving query patterns. With careful tuning, your PostgreSQL database will handle millions of queries per second with ease.