Table of Contents
- Arrays
- Hstore
- JSON/JSONB
- Range Types
- Composite Types
- Enums
- Choosing the Right Data Structure
- Conclusion
- References
Arrays
What are Arrays?
PostgreSQL supports arrays of any built-in or user-defined data type, allowing a single column to store multiple values of the same type. Arrays can be one-dimensional (e.g., [1, 2, 3]) or multi-dimensional (e.g., [[1, 2], [3, 4]]), though one-dimensional arrays are most commonly used.
Use Cases
- Storing lists of related items (e.g., tags for a blog post:
["sql", "postgresql", "database"]). - Managing multiple values per entity (e.g., phone numbers for a user).
- Representing ordered data (e.g., steps in a workflow).
Syntax & Examples
Creating a Table with Arrays
Define an array column by appending [] to the data type:
CREATE TABLE blog_posts (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
tags TEXT[] NOT NULL, -- Array of text
ratings INT[] -- Array of integers (nullable)
);
Inserting Data into Arrays
Use ARRAY[] syntax or string literals with curly braces ({}):
-- Insert using ARRAY[]
INSERT INTO blog_posts (title, tags, ratings)
VALUES (
'PostgreSQL Arrays 101',
ARRAY['sql', 'postgresql', 'arrays'], -- Text array
ARRAY[4, 5, 5] -- Integer array
);
-- Insert using curly brace syntax (alternative)
INSERT INTO blog_posts (title, tags)
VALUES (
'Advanced PostgreSQL Tips',
'{database, tips, performance}' -- Equivalent to ARRAY[...]
);
Querying Arrays
Access elements using 1-based indexing (e.g., tags[1] for the first tag). Use array functions like unnest to expand arrays into rows:
-- Get the first tag of each post
SELECT title, tags[1] AS first_tag FROM blog_posts;
-- Expand tags into individual rows (useful for joins/aggregations)
SELECT title, unnest(tags) AS tag FROM blog_posts;
-- Filter posts with a specific tag (e.g., "postgresql")
SELECT title FROM blog_posts WHERE 'postgresql' = ANY(tags);
Indexing Arrays
For large datasets, index arrays with a GIN (Generalized Inverted Index) to speed up membership checks (e.g., = ANY):
-- GIN index for fast tag lookups
CREATE INDEX idx_blog_posts_tags ON blog_posts USING GIN (tags);
Hstore: Key-Value Pairs
What is Hstore?
hstore is a built-in data type for storing key-value pairs within a single column. It’s ideal for semi-structured data with arbitrary keys but simple (non-nested) values (all keys and values are text).
Use Cases
- Storing dynamic attributes (e.g., user preferences:
theme => 'dark', notifications => 'on'). - Adding flexible metadata to records without altering the schema.
- Migrating from key-value stores (e.g., Redis) to PostgreSQL.
Syntax & Examples
Enable Hstore (if not already enabled)
hstore is an extension; enable it first:
CREATE EXTENSION IF NOT EXISTS hstore;
Creating a Table with Hstore
CREATE TABLE user_profiles (
id SERIAL PRIMARY KEY,
user_id INT UNIQUE NOT NULL,
preferences HSTORE NOT NULL DEFAULT ''::HSTORE -- Default empty hstore
);
Inserting Hstore Data
Use hstore(key => value, ...) syntax or string literals ("key1"=>"val1", "key2"=>"val2"):
INSERT INTO user_profiles (user_id, preferences)
VALUES (
1,
hstore('theme', 'dark', 'notifications', 'on', 'language', 'en')
);
-- Alternative: String literal syntax
INSERT INTO user_profiles (user_id, preferences)
VALUES (
2,
'"theme"=>"light", "notifications"=>"off"'
);
Querying Hstore
Use -> to get values as hstore (text) or ->> to get values as TEXT:
-- Get a user's theme preference
SELECT user_id, preferences -> 'theme' AS theme FROM user_profiles;
-- Filter users with dark theme
SELECT user_id FROM user_profiles WHERE preferences -> 'theme' = 'dark';
-- Add/Update a key (using hstore || operator for merging)
UPDATE user_profiles
SET preferences = preferences || hstore('font_size', '16px')
WHERE user_id = 1;
Indexing Hstore
Index hstore columns with GIN (for key/value existence) or BTREE (for specific keys):
-- GIN index for existence/contains queries
CREATE INDEX idx_user_prefs_gin ON user_profiles USING GIN (preferences);
-- BTREE index for a specific key (e.g., "theme")
CREATE INDEX idx_user_prefs_theme ON user_profiles ((preferences -> 'theme'));
Limitations
- Values are always text (no numbers/dates—you must cast them).
- No nested structures (use JSONB instead for nesting).
JSON/JSONB: Flexible Semi-Structured Data
PostgreSQL supports two JSON types: JSON (text-based, stored as-is) and JSONB (binary-optimized, parsed and stored in a decomposed format). JSONB is preferred for most use cases due to faster querying and indexing.
What is JSON/JSONB?
- JSON: Stores JSON as plain text. Use for storing/retrieving entire JSON blobs (slow for querying nested fields).
- JSONB: Parses JSON into a binary format, enabling efficient indexing and fast lookups. Supports nested structures, arrays, and complex queries.
Use Cases
- Storing nested semi-structured data (e.g., product catalogs with varying attributes:
{"name": "Laptop", "specs": {"RAM": "16GB", "storage": "512GB"}}). - Building APIs with flexible schemas (e.g., event logs with dynamic payloads).
- Replacing EAV (Entity-Attribute-Value) anti-patterns.
Syntax & Examples
Creating a Table with JSONB
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
details JSONB NOT NULL -- Use JSONB for query performance
);
Inserting JSONB Data
Insert JSON strings (PostgreSQL validates JSON syntax):
INSERT INTO products (name, details)
VALUES (
'Wireless Headphones',
'{
"brand": "SoundTech",
"price": 199.99,
"features": ["noise_canceling", "bluetooth_5.0"],
"specs": {
"battery_life": "30h",
"weight_g": 250
}
}'::JSONB -- Cast string to JSONB
);
Querying JSONB
Use operators to access nested fields:
->: Get JSON object/array (returnsJSONB).->>: Get text value (returnsTEXT).#>: Get nested JSON via path (e.g.,details#>'{specs, battery_life}').#>>: Get nested text value via path.
-- Get brand (text value)
SELECT name, details ->> 'brand' AS brand FROM products;
-- Get battery life (nested text value)
SELECT name, details #>> '{specs, battery_life}' AS battery_life FROM products;
-- Filter products with noise cancellation
SELECT name FROM products WHERE details -> 'features' ? 'noise_canceling';
Updating JSONB Data
Use jsonb_set(target, path, new_value) to modify nested fields:
-- Update price to 179.99
UPDATE products
SET details = jsonb_set(details, '{price}', '179.99'::JSONB)
WHERE name = 'Wireless Headphones';
Indexing JSONB
JSONB supports powerful indexing with GIN (for containment/existence) or BTREE (for specific fields):
-- GIN index for "contains" queries (e.g., features ? 'noise_canceling')
CREATE INDEX idx_products_details_gin ON products USING GIN (details);
-- BTREE index for a specific nested field (e.g., price)
CREATE INDEX idx_products_price ON products ((details ->> 'price')::NUMERIC);
JSON vs JSONB: When to Use Which?
| JSON | JSONB |
|---|---|
| Slower querying (text parsing). | Faster querying (binary format). |
| Stores whitespace/order (preserves input). | Ignores whitespace, sorts keys. |
| No indexing (except via expressions). | Supports GIN/BTREE indexing. |
| Use for storage-only, unmodified JSON. | Use for querying/updating JSON. |
Range Types: Intervals and Spans
What are Range Types?
Range types represent intervals (e.g., date ranges, number ranges) as a single value. PostgreSQL provides built-in range types (e.g., int4range for integers, daterange for dates) and supports custom ranges.
Use Cases
- Booking systems (e.g., hotel reservations:
[2024-01-01, 2024-01-05)). - Scheduling (e.g., meeting times:
[14:00, 15:00]). - Validating ranges (e.g., age brackets:
[18, 30)for 18–29 years).
Syntax & Examples
Built-in Range Types
Common built-in ranges:
int4range: 32-bit integer range (e.g.,[1, 10)).int8range: 64-bit integer range.daterange: Date range (e.g.,[2024-01-01, 2024-12-31]).tsrange: Timestamp range (with time zones).
Creating a Table with Range Types
CREATE TABLE hotel_bookings (
id SERIAL PRIMARY KEY,
room_id INT NOT NULL,
guest_name VARCHAR(255) NOT NULL,
stay_period DATERANGE NOT NULL, -- Date range (inclusive start, exclusive end by default)
CHECK (stay_period IS NOT NULL AND lower(stay_period) < upper(stay_period)) -- Ensure valid range
);
Inserting Range Data
Use range(lower, upper, bounds) syntax (bounds: [] inclusive, () exclusive):
INSERT INTO hotel_bookings (room_id, guest_name, stay_period)
VALUES (
101,
'Alice Smith',
daterange('2024-07-01', '2024-07-05') -- Default: [2024-07-01, 2024-07-05)
);
-- Explicit bounds: inclusive end
INSERT INTO hotel_bookings (room_id, guest_name, stay_period)
VALUES (
101,
'Bob Jones',
daterange('2024-07-06', '2024-07-10', '[]') -- [2024-07-06, 2024-07-10]
);
Querying Range Types
Use range operators to check overlaps, containment, or adjacency:
| Operator | Description | Example |
|---|---|---|
&& | Overlap | stay_period && daterange('2024-07-03', '2024-07-08') |
@> | Contains | stay_period @> '2024-07-02'::DATE |
<@ | Contained by | '2024-07-02'::DATE <@ stay_period |
| `- | -` | Adjacent (no gap) |
Example: Find overlapping bookings for room 101 in July 2024:
SELECT guest_name, stay_period
FROM hotel_bookings
WHERE room_id = 101
AND stay_period && daterange('2024-07-01', '2024-07-31');
Indexing Range Types
Use GIST (Generalized Search Tree) indexes for fast range queries:
CREATE INDEX idx_bookings_stay_period ON hotel_bookings USING GIST (stay_period);
Composite Types: Encapsulated Data Structures
What are Composite Types?
Composite types (user-defined) let you group multiple fields into a single type, similar to a “struct” in programming languages. They encapsulate related data (e.g., an address type with street, city, zip).
Use Cases
- Encapsulating multi-field data (e.g.,
address,coordinates). - Simplifying tables with repeated groups of columns (e.g., billing and shipping addresses).
Syntax & Examples
Creating a Composite Type
Define a composite type with CREATE TYPE:
CREATE TYPE address AS (
street VARCHAR(255),
city VARCHAR(100),
state CHAR(2),
zip VARCHAR(10)
);
Using Composite Types in Tables
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
billing_address address NOT NULL, -- Composite type column
shipping_address address -- Nullable composite type
);
Inserting Composite Data
Use ROW() syntax to construct composite values:
INSERT INTO customers (name, billing_address, shipping_address)
VALUES (
'Jane Doe',
ROW('123 Main St', 'Portland', 'OR', '97201'), -- Billing address
ROW('123 Main St', 'Portland', 'OR', '97201') -- Shipping address (same as billing)
);
Querying Composite Types
Access fields using dot notation (e.g., billing_address.city):
-- Get customer names and billing cities
SELECT name, (billing_address).city AS billing_city FROM customers;
-- Filter by state
SELECT name FROM customers WHERE (billing_address).state = 'OR';
Limitations
- Composite types can complicate ORM integration (many ORMs lack native support).
- Altering composite types (e.g., adding a field) requires
ALTER TYPE, which may lock tables.
Enums: Enumerated Values
What are Enums?
Enums (enumerated types) are user-defined types with a fixed set of allowed values. They enforce data integrity by restricting columns to predefined options.
Use Cases
- Storing statuses (e.g.,
order_statuswith valuespending,shipped,delivered). - Ensuring consistency in categorical data (e.g.,
user_role:admin,editor,viewer).
Syntax & Examples
Creating an Enum Type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'cancelled');
Using Enums in Tables
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
order_number VARCHAR(50) UNIQUE NOT NULL,
status order_status NOT NULL DEFAULT 'pending' -- Enum column with default
);
Inserting Enum Values
Insert values as unquoted identifiers (or quoted strings, but unquoted is preferred):
INSERT INTO orders (order_number, status)
VALUES
('ORD-001', 'pending'),
('ORD-002', 'processing'),
('ORD-003', 'shipped');
Benefits Over Check Constraints
Enums are more efficient than CHECK constraints for fixed values:
- Enums store values as integers (smaller storage footprint).
- They provide built-in ordering (e.g.,
pending < processing < shipped). - Error messages are clearer (e.g., “invalid input value for enum order_status: ‘invalid’”).
Choosing the Right Data Structure
With so many options, how do you decide which data structure to use? Here’s a quick guide:
| Data Structure | Best For | Avoid When |
|---|---|---|
| Arrays | Ordered lists of same-type values | Nested data or unordered key-value pairs |
| Hstore | Simple key-value pairs (non-nested) | Nested data or non-text values |
| JSONB | Nested semi-structured data, flexible schemas | Simple key-value pairs (use Hstore) or fixed schemas (use tables) |
| Range Types | Intervals (dates, numbers) | Discrete values or non-contiguous ranges |
| Composite Types | Encapsulated multi-field data | Frequent schema changes or ORM-heavy workflows |
| Enums | Fixed categorical values | Values that may change frequently (use a lookup table) |
Conclusion
PostgreSQL’s advanced data structures transform it from a traditional relational database into a versatile tool for handling diverse data needs. By leveraging arrays for lists, JSONB for flexible schemas, range types for intervals, and enums for fixed values, you can model data more naturally, reduce application complexity, and boost performance.
The key is to match the data structure to your use case: prioritize relational columns for structured, fixed data, and advanced types for flexibility or specialized needs. With these tools in hand, you’ll unlock PostgreSQL’s full potential.