Table of Contents
- What is Garbage Collection?
- How Ruby Manages Memory: The Heap and Object Lifecycle
- Evolution of Ruby’s Garbage Collection Algorithms
- Generational Garbage Collection: The Core of Modern Ruby GC
- Mark-Sweep-Compact: The Building Blocks
- Minor vs. Major Garbage Collection
- Tuning Ruby Garbage Collection
- Common Memory Issues and Solutions
- Conclusion
- References
What is Garbage Collection?
Garbage Collection (GC) is an automatic memory management mechanism that identifies and reclaims memory occupied by “dead” objects—objects that are no longer reachable or useful to the application. In low-level languages like C or C++, developers must manually allocate and free memory (e.g., with malloc and free), a process prone to errors like memory leaks (unfreed memory) or dangling pointers (accessing freed memory).
Ruby, being a high-level, dynamically typed language, eliminates this burden by handling memory automatically. The Ruby interpreter includes a GC subsystem that runs in the background, ensuring unused memory is returned to the system for reuse. This automation lets developers focus on business logic, but it’s not a silver bullet: poor object lifecycle management can still lead to bloated memory usage or excessive GC overhead.
How Ruby Manages Memory: The Heap and Object Lifecycle
Before diving into GC, let’s clarify how Ruby allocates memory.
The Heap: Where Objects Live
Ruby stores all objects (strings, arrays, hashes, etc.) in a heap—a large, contiguous block of memory reserved by the Ruby interpreter. Unlike the stack (used for function calls and local variables with fixed sizes), the heap is unstructured and used for dynamic, long-lived data.
When you create an object (e.g., str = "hello"), Ruby allocates a slot in the heap for it. The heap is divided into “pages” or “slots,” and each object occupies one or more slots depending on its size.
Object Lifecycle
An object’s lifecycle in Ruby has three stages:
- Allocation: Ruby reserves heap memory for the object (e.g.,
Array.new). - Usage: The object is referenced by variables, passed to methods, or stored in data structures (making it “live”).
- Death: The object becomes unreachable (no references point to it). It is then marked for garbage collection.
Reachability is key: An object is “live” if it can be reached by traversing references starting from “root” objects. Roots include:
- Global variables (e.g.,
$LOAD_PATH). - Local variables in the current call stack.
- Instance variables of active objects (e.g., a Rails controller instance).
- Registers and internal interpreter state.
Unreachable objects are “dead” and eligible for GC.
Evolution of Ruby’s Garbage Collection Algorithms
Ruby’s GC has evolved significantly to keep pace with performance demands. Here’s a brief timeline:
- Ruby 1.8: Used a basic mark-and-sweep algorithm. It paused the entire application during GC (a “stop-the-world” pause), making it slow for large heaps.
- Ruby 1.9: Improved mark-and-sweep with incremental marking (pausing and resuming to reduce stop-the-world time) but still lacked generational support.
- Ruby 2.0 (2013): Introduced generational GC, dividing the heap into “young” and “old” generations to target short-lived objects.
- Ruby 2.1 (2014): Added incremental sweeping and reduced major GC pauses.
- Ruby 2.2 (2014): Introduced RGenGC (Ruby Generational GC), a more efficient generational implementation with write barriers to track cross-generation references.
- Ruby 2.4 (2016): Added compaction to reduce heap fragmentation.
- Ruby 3.0+: Further optimizations, including “copy-on-write” friendly heaps and reduced pause times for large applications.
Modern Ruby (2.5+) uses a generational mark-sweep-compact algorithm, which we’ll explore next.
Generational Garbage Collection: The Core of Modern Ruby GC
Generational GC is based on a key observation: most objects die young. For example, temporary strings created in a loop or short-lived method variables are often discarded quickly. Only a small percentage of objects survive long enough to be reused.
To leverage this, Ruby divides the heap into generations:
- Young Generation (Nursery): Newly allocated objects start here. It’s small and collected frequently.
- Old Generation: Objects that survive multiple young generation collections are “promoted” here. It’s larger and collected less often.
Some Ruby versions (e.g., 2.7+) add a mature generation for very long-lived objects, but the young/old split remains core.
How Generations Work
- Allocation: New objects are allocated in the young generation.
- Minor GC: The young generation is collected frequently (minor GC). Most objects here are dead, so this is fast.
- Promotion: Objects that survive a minor GC are moved to the old generation.
- Major GC: When the old generation fills up, a major GC runs, collecting both young and old generations.
Write Barriers: To track references from old to young objects (which could keep young objects alive), Ruby uses “write barriers.” These track when an old object references a young object, ensuring the young object is marked as live during minor GC.
Mark-Sweep-Compact: The Building Blocks
Generational GC relies on three core phases: mark, sweep, and compact. Let’s break them down.
1. Mark Phase
The goal: Identify all live objects.
- Root Traversal: The GC starts from root objects (globals, stack, etc.) and traverses all reachable objects, marking them as “live.”
- Mark Bits: Each object has a “mark bit” (a flag) set to
trueif live,falseif dead.
Example: If a = [1, 2, 3] is a root, the array a and its elements (1, 2, 3) are marked live.
2. Sweep Phase
The goal: Free memory occupied by dead objects.
- The GC iterates over the heap, checking each object’s mark bit.
- Unmarked (dead) objects have their memory freed, returning it to the “free list” for future allocations.
Result: Memory is reclaimed, but the heap may become fragmented (free slots scattered between live objects).
3. Compact Phase (Optional but Critical)
The goal: Reduce heap fragmentation by defragmenting memory.
- Live objects are moved to contiguous blocks of memory.
- References to these objects are updated to point to their new locations.
Why compact? Fragmentation slows allocation (the GC must search for large enough free slots). Compaction makes future allocations faster by creating large, contiguous free blocks.
Ruby runs compaction during major GC (configurable via RUBY_GC_COMPACT).
Minor vs. Major Garbage Collection
Ruby’s GC distinguishes between two types of collections:
Minor GC
- Target: Young generation only.
- Frequency: Triggered when the young generation is full (e.g., after allocating ~10k small objects).
- Phases: Mark-sweep (no compaction, since young generation is small and fragmentation is minimal).
- Cost: Fast (milliseconds), as most young objects are dead.
Major GC
- Target: Young + old generations.
- Trigger: When old generation is full, or after a fixed number of minor GCs (configurable).
- Phases: Mark-sweep-compact (compaction reduces fragmentation in the old generation).
- Cost: Slower (tens of milliseconds to seconds for large heaps), as the old generation has more live objects.
Stop-the-World Pauses: Both minor and major GC pause the application during the mark phase (though modern Ruby uses incremental marking to split this into smaller pauses). Long pauses can harm latency-sensitive apps (e.g., web servers), making GC tuning critical.
Tuning Ruby Garbage Collection
While Ruby’s GC works well out of the box, tuning can optimize performance for specific workloads. Here’s how to adjust and monitor it.
Key Configuration Options
Ruby exposes GC settings via environment variables or the GC module. Common options:
| Variable | Purpose |
|---|---|
RUBY_GC_HEAP_INIT_SLOTS | Initial number of heap slots (default: ~10k). Increase for apps with many initial allocations. |
RUBY_GC_HEAP_GROWTH_FACTOR | Factor by which the heap grows when full (default: 1.8). Lower for slower growth, higher for faster. |
RUBY_GC_MALLOC_LIMIT | Threshold (bytes) for triggering GC when using malloc (default: ~16MB). Increase to reduce minor GCs. |
RUBY_GC_OLDMALLOC_LIMIT | Threshold for old generation (default: ~16MB). Increase to reduce major GCs. |
RUBY_GC_COMPACT | Enable compaction (default: 0; set to 1 to compact on major GC). |
Monitoring GC Behavior
Use Ruby’s built-in tools to diagnose GC performance:
-
GC.stat: Returns a hash of GC metrics (e.g.,count= number of GCs,heap_live_slots= live objects).puts GC.stat # => {:count=>5, :heap_live_slots=>1234, :heap_free_slots=>456, ...} -
GC::Profiler: Measures GC duration.GC::Profiler.enable # Run your code... GC::Profiler.report # Prints GC times and counts -
Third-Party Tools:
memory_profilergem (tracks object allocations),ruby-prof(profiles GC overhead), orobjspace(examine object references).
Best Practices for Tuning
- Avoid Premature Tuning: Only tune if GC pauses or memory usage are problematic.
- Reduce Allocations: Minimize short-lived objects (e.g., avoid string interpolation in loops:
str = "#{x}"creates a new string each iteration; useString#<<instead). - Use Frozen Strings:
'hello'.freezereuses the same string object instead of allocating new ones. - Limit Old Generation Growth: Avoid caching large objects unnecessarily (use LRU caches with eviction policies).
Common Memory Issues and Solutions
Even with automatic GC, memory problems can arise. Here are two critical issues and fixes:
1. Memory Leaks
A memory leak occurs when objects are unintentionally retained, causing the heap to grow indefinitely.
Causes:
- Unremoved event listeners (e.g., in Rails Action Cable).
- Global caches that never evict old entries (e.g.,
$CACHE = {}with no cleanup). - Accidental references in closures or procs.
Detection:
- Use
memory_profilerto track object counts over time:require 'memory_profiler' report = MemoryProfiler.report { run_your_code } report.pretty_print - Monitor
GC.stat[:heap_live_slots]—a steady increase indicates a leak.
Fixes:
- Use weak references (via the
weakrefgem) for non-critical caches. - Implement LRU (Least Recently Used) caches (e.g.,
ActiveSupport::Cache::LRUCache). - Audit global variables and ensure listeners are detached when no longer needed.
2. Heap Fragmentation
Fragmentation occurs when the heap has many small free slots but no large contiguous blocks, slowing allocations.
Causes:
- Frequent allocation/deallocation of objects of varying sizes.
- Disabled compaction.
Fixes:
- Enable compaction with
RUBY_GC_COMPACT=1. - Increase heap size to reduce the frequency of sweeps (fewer sweeps mean less fragmentation).
Conclusion
Ruby’s garbage collection is a sophisticated system that balances automation and performance. By understanding its core concepts—generational hypothesis, mark-sweep-compact phases, and minor/major collections—you can write Ruby code that works with the GC, not against it.
Key takeaways:
- Most objects die young, so generational GC optimizes for short-lived objects.
- Minor GC is fast and frequent; major GC is slower but necessary for long-lived objects.
- Tune GC settings and monitor metrics to avoid leaks and fragmentation.
With this knowledge, you’ll be better equipped to build Ruby applications that are efficient, scalable, and free of memory-related headaches.
References
- Ruby GC Documentation
- Ruby 2.2 Garbage Collection: RGenGC
- Understanding Ruby’s Garbage Collection (SitePoint)
- Ruby Performance Optimization by Alexander Dymo (Pragmatic Bookshelf)
- Ruby 3.0 GC Improvements