Table of Contents
Click chapter to expand sections
| List of Algorithms | xvii | ||
| List of Figures | xxi | ||
| List of Tables | xxv | ||
| Preface | xxvii | ||
| Acknowledgements | xxxiii | ||
| Authors | xxxv | ||
| 1 | Introduction | 1 | |
| 1.1 | Explicit deallocation | 2 | |
| 1.2 | Automatic dynamic memory management | 4 | |
| 1.3 | Comparing garbage collection algorithms | 6 | |
| Safety | 6 | ||
| Throughput and cycles consumed | 6 | ||
| Completeness and promptness | 7 | ||
| Pause time and latency | 7 | ||
| Space overhead | 8 | ||
| Energy use | 8 | ||
| Optimisations for specific languages | 9 | ||
| Scalability and portability | 10 | ||
| 1.4 | A performance disadvantage? | 10 | |
| 1.5 | Experimental methodology | 11 | |
| 1.6 | Terminology and notation | 12 | |
| The heap | 13 | ||
| The mutator and the collector | 13 | ||
| The mutator roots | 14 | ||
| References, fields and addresses | 14 | ||
| Liveness, correctness and reachability | 15 | ||
| Pseudocode | 15 | ||
| The allocator | 15 | ||
| Mutator read and write operations | 16 | ||
| Atomic operations | 16 | ||
| Sets, multisets, sequences and tuples | 17 | ||
| 2 | Mark-sweep garbage collection | 19 | |
| 2.1 | The mark-sweep algorithm | 20 | |
| 2.2 | The tricolour abstraction | 23 | |
| 2.3 | Improving mark-sweep | 24 | |
| 2.4 | Bitmap marking | 24 | |
| 2.5 | Lazy sweeping | 27 | |
| 2.6 | Cache misses in the marking loop | 29 | |
| 2.7 | Issues to consider | 31 | |
| Mutator overhead | 31 | ||
| Throughput | 32 | ||
| Space usage | 32 | ||
| To move or not to move? | 32 | ||
| 3 | Mark-compact garbage collection | 35 | |
| 3.1 | Two-finger compaction | 36 | |
| 3.2 | The Lisp 2 algorithm | 38 | |
| 3.3 | Threaded compaction | 40 | |
| 3.4 | One-pass algorithms | 42 | |
| 3.5 | Issues to consider | 45 | |
| Is compaction necessary? | 45 | ||
| Throughput costs of compaction | 45 | ||
| Long-lived data | 45 | ||
| Locality | 46 | ||
| Limitations of mark-compact algorithms | 46 | ||
| 4 | Copying garbage collection | 47 | |
| 4.1 | Semispace copying collection | 47 | |
| Work-list implementations | 48 | ||
| An example | 50 | ||
| 4.2 | Traversal order and locality | 50 | |
| 4.3 | Issues to consider | 57 | |
| Allocation | 57 | ||
| Space and locality | 58 | ||
| Moving objects | 59 | ||
| 5 | Reference counting | 61 | |
| 5.1 | Advantages and disadvantages of reference counting | 62 | |
| 5.2 | Improving efficiency | 64 | |
| 5.3 | Deferred reference counting | 65 | |
| 5.4 | Coalesced reference counting | 68 | |
| 5.5 | Cyclic reference counting | 71 | |
| 5.6 | Limited-field reference counting | 77 | |
| 5.7 | Issues to consider | 78 | |
| The environment | 78 | ||
| Advanced solutions | 78 | ||
| 6 | Comparing garbage collectors | 81 | |
| 6.1 | Throughput | 81 | |
| 6.2 | Pause time and latency | 82 | |
| 6.3 | Space | 82 | |
| 6.4 | Implementation | 83 | |
| 6.5 | Adaptive systems | 84 | |
| 6.6 | A unified theory of garbage collection | 85 | |
| Abstract garbage collection | 85 | ||
| Tracing garbage collection | 85 | ||
| Reference counting garbage collection | 87 | ||
| 7 | Allocation | 91 | |
| 7.1 | Sequential allocation | 91 | |
| 7.2 | Free-list allocation | 93 | |
| First-fit allocation | 93 | ||
| Next-fit allocation | 93 | ||
| Best-fit allocation | 94 | ||
| Speeding free-list allocation | 95 | ||
| 7.3 | Fragmentation | 97 | |
| 7.4 | Segregated-fits allocation | 98 | |
| Fragmentation | 99 | ||
| Populating size classes | 100 | ||
| 7.5 | Combining segregated-fits with first-, best- and next-fit | 101 | |
| 7.6 | Additional considerations | 101 | |
| Alignment | 101 | ||
| Size constraints | 102 | ||
| Boundary tags | 102 | ||
| Heap parsability | 103 | ||
| Locality | 104 | ||
| Wilderness preservation | 105 | ||
| Crossing maps | 105 | ||
| 7.7 | Allocation in concurrent systems | 105 | |
| 7.8 | Issues to consider | 106 | |
| 8 | Partitioning the heap | 109 | |
| 8.1 | Terminology | 109 | |
| 8.2 | Why to partition | 109 | |
| Partitioning by mobility | 110 | ||
| Partitioning by size | 110 | ||
| Partitioning for space | 110 | ||
| Partitioning by kind | 111 | ||
| Partitioning for yield | 111 | ||
| Partitioning for responsiveness | 112 | ||
| Partitioning for locality | 112 | ||
| Partitioning by thread | 113 | ||
| Partitioning by availability | 113 | ||
| Partitioning by mutability | 114 | ||
| 8.3 | How to partition | 114 | |
| 8.4 | When to partition | 116 | |
| 9 | Generational garbage collection | 117 | |
| 9.1 | Example | 118 | |
| 9.2 | Measuring time | 119 | |
| 9.3 | Generational hypotheses | 119 | |
| 9.4 | Generations and heap layout | 120 | |
| 9.5 | Multiple generations | 121 | |
| 9.6 | Age recording | 122 | |
| En masse promotion | 122 | ||
| Aging semispaces | 124 | ||
| Survivor spaces and flexibility | 126 | ||
| 9.7 | Adapting to program behaviour | 127 | |
| Appel-style garbage collection | 127 | ||
| Feedback-controlled promotion | 129 | ||
| 9.8 | Inter-generational pointers | 130 | |
| Remembered sets | 130 | ||
| Pointer direction | 131 | ||
| 9.9 | Space management | 132 | |
| 9.10 | Older-first garbage collection | 133 | |
| 9.11 | Beltway | 136 | |
| 9.12 | Analytic support for generational collection | 138 | |
| 9.13 | Issues to consider | 139 | |
| 9.14 | Abstract generational garbage collection | 141 | |
| 10 | Other partitioned schemes | 143 | |
| 10.1 | Large object spaces | 143 | |
| The Treadmill garbage collector | 144 | ||
| Moving objects with operating system support | 145 | ||
| Pointer-free objects | 146 | ||
| 10.2 | Topological collectors | 146 | |
| Mature Object Space garbage collection | 146 | ||
| Connectivity-based garbage collection | 149 | ||
| Thread-local garbage collection | 150 | ||
| Stack allocation | 155 | ||
| Region inferencing | 156 | ||
| 10.3 | Hybrid mark-sweep, copying collectors | 157 | |
| 10.4 | Garbage-First: collecting young regions | 158 | |
| 10.5 | Trading space and time | 161 | |
| 10.6 | Mark-region collection: immix | 162 | |
| 10.7 | Copying collection in a constrained memory space | 164 | |
| 10.8 | Bookmarking garbage collection | 166 | |
| 10.9 | Ulterior reference counting | 167 | |
| 10.10 | Issues to consider | 168 | |
| 11 | Run-time interface | 171 | |
| 11.1 | Interface to allocation | 171 | |
| Speeding allocation | 174 | ||
| Zeroing | 175 | ||
| 11.2 | Finding pointers | 176 | |
| Conservative pointer finding | 176 | ||
| Accurate pointer finding using tagged values | 179 | ||
| Accurate pointer finding in objects | 180 | ||
| Accurate pointer finding in global roots | 181 | ||
| Accurate pointer finding in stacks and registers | 182 | ||
| Accurate pointer finding in code | 192 | ||
| Handling interior pointers | 193 | ||
| Handling derived pointers | 194 | ||
| 11.3 | Object tables | 195 | |
| 11.4 | References from external code | 196 | |
| 11.5 | Stack barriers | 197 | |
| 11.6 | GC safe-points and mutator suspension | 198 | |
| 11.7 | Garbage collecting code | 201 | |
| 11.8 | Read and write barriers | 202 | |
| Engineering | 202 | ||
| Precision of write barriers | 203 | ||
| Hash tables | 205 | ||
| Sequential store buffers | 206 | ||
| Overflow action | 208 | ||
| Cards and card tables | 208 | ||
| Crossing maps | 210 | ||
| Summarising cards | 212 | ||
| Hardware and virtual memory techniques | 213 | ||
| Write barrier mechanisms: in summary | 214 | ||
| Chunked lists | 214 | ||
| 11.9 | Managing address space | 215 | |
| 11.10 | Applications of virtual memory page protection | 217 | |
| Double mapping | 217 | ||
| Applications of no-access pages | 218 | ||
| 11.11 | Choosing heap size | 220 | |
| 11.12 | Issues to consider | 222 | |
| 12 | Language-specific concerns | 225 | |
| 12.1 | Finalisation | 225 | |
| When do finalisers run? | 226 | ||
| Which thread runs a finaliser? | 227 | ||
| Can finalisers run concurrently with each other? | 228 | ||
| Can finalisers access the object that became unreachable? | 228 | ||
| When are finalised objects reclaimed? | 229 | ||
| What happens if there is an error in a finaliser? | 229 | ||
| Is there any guaranteed order to finalisation? | 229 | ||
| The finalisation race problem | 230 | ||
| Finalisers and locks | 231 | ||
| Finalisation and weak references | 231 | ||
| Finalisation in particular languages | 232 | ||
| For further study | 233 | ||
| 12.2 | Weak references | 233 | |
| Additional motivations | 235 | ||
| Weak references in Java: the short story | 235 | ||
| Weak references in Java: the full story | 236 | ||
| Race in weak pointer clearing | 239 | ||
| Weak pointers in other languages | 240 | ||
| 12.3 | Changing object layout | 241 | |
| 12.4 | Issues to consider | 243 | |
| 13 | Concurrency preliminaries | 245 | |
| 13.1 | Hardware | 245 | |
| Processors and threads | 245 | ||
| Interconnect | 246 | ||
| Memory | 247 | ||
| Caches | 247 | ||
| Coherence | 247 | ||
| Cache coherence performance example: spin locks | 248 | ||
| 13.2 | Hardware memory consistency | 250 | |
| Fences and happens-before | 251 | ||
| Consistency models | 252 | ||
| 13.3 | Hardware primitives | 253 | |
| Compare-and-swap | 253 | ||
| Load-linked/store-conditionally | 254 | ||
| Atomic arithmetic primitives | 255 | ||
| Test then test-and-set | 257 | ||
| More powerful primitives | 257 | ||
| Overheads of atomic primitives | 258 | ||
| 13.4 | Progress guarantees | 259 | |
| Progress guarantees and concurrent collection | 260 | ||
| 13.5 | Notation used for concurrent algorithms | 261 | |
| 13.6 | Mutual exclusion | 262 | |
| 13.7 | Work sharing and termination detection | 264 | |
| Rendezvous barriers | 269 | ||
| 13.8 | Concurrent data structures | 269 | |
| Concurrent stacks | 272 | ||
| Concurrent queue implemented with singly linked list | 272 | ||
| Concurrent queue implemented with array | 276 | ||
| Concurrent deque for work stealing | 281 | ||
| 13.9 | Transactional memory | 282 | |
| Using transactional memory to help implement collection | 286 | ||
| Supporting transactional memory in the presence of garbage collection | 288 | ||
| 13.10 | Issues to consider | 289 | |
| 14 | Parallel garbage collection | 291 | |
| 14.1 | Is there sufficient work to parallelise? | 292 | |
| 14.2 | Load balancing | 293 | |
| 14.3 | Synchronisation | 294 | |
| 14.4 | Taxonomy | 295 | |
| 14.5 | Parallel marking | 295 | |
| 14.6 | Parallel copying | 304 | |
| Processor-centric techniques | 305 | ||
| Memory-centric techniques | 310 | ||
| 14.7 | Parallel sweeping | 315 | |
| 14.8 | Parallel compaction | 316 | |
| 14.9 | Garbage collection on the GPU? | 319 | |
| GPU background | 319 | ||
| Heap reference graphs | 321 | ||
| Marking on the GPU | 322 | ||
| A problem not yet solved | 324 | ||
| 14.10 | Issues to consider | 324 | |
| Terminology | 324 | ||
| Is parallel collection worthwhile? | 324 | ||
| Strategies for balancing loads | 325 | ||
| Managing tracing | 325 | ||
| Low-level synchronisation | 327 | ||
| Sweeping and compaction | 327 | ||
| Termination | 328 | ||
| 15 | Concurrent garbage collection | 329 | |
| 15.1 | Correctness of concurrent collection | 331 | |
| The tricolour abstraction, revisited | 331 | ||
| The lost object problem | 332 | ||
| The strong and weak tricolour invariants | 334 | ||
| Precision | 335 | ||
| Mutator colour | 335 | ||
| Allocation colour | 336 | ||
| Pointer tagging | 336 | ||
| Incremental update solutions | 336 | ||
| Snapshot-at-the-beginning solutions | 337 | ||
| 15.2 | Barrier techniques for concurrent collection | 337 | |
| Grey mutator techniques | 337 | ||
| Black mutator techniques | 339 | ||
| Completeness of barrier techniques | 340 | ||
| Load barriers | 341 | ||
| Concurrent write barrier mechanisms | 342 | ||
| One-level card tables | 343 | ||
| Two-level card tables | 343 | ||
| Reducing work | 343 | ||
| Eliminating read barriers | 344 | ||
| 15.3 | Ragged phase changes | 344 | |
| 15.4 | Issues to consider | 346 | |
| 16 | Concurrent mark-sweep | 349 | |
| 16.1 | Initialisation | 349 | |
| 16.2 | Termination | 351 | |
| 16.3 | Allocation | 352 | |
| 16.4 | Concurrent marking and sweeping | 353 | |
| 16.5 | Garbage-First: collecting young and old regions | 354 | |
| 16.6 | On-the-fly marking | 357 | |
| Write barriers for on-the-fly collection | 357 | ||
| Doligez-Leroy-Gonthier | 358 | ||
| Doligez-Leroy-Gonthier for Java | 359 | ||
| Sliding views | 360 | ||
| 16.7 | Abstract concurrent collection | 360 | |
| The collector wavefront | 361 | ||
| Adding origins | 363 | ||
| Mutator barriers | 363 | ||
| Precision | 363 | ||
| Instantiating collectors | 364 | ||
| 16.8 | Issues to consider | 364 | |
| 17 | Concurrent copying and compaction | 367 | |
| 17.1 | Mostly-concurrent copying | 367 | |
| Baker's algorithm | 368 | ||
| Mostly-concurrent, mostly-copying collection | 370 | ||
| 17.2 | Brooks's indirection barrier | 370 | |
| 17.3 | Self-erasing read barriers | 371 | |
| 17.4 | Replication copying | 372 | |
| Multi-version copying | 372 | ||
| Extensions to avoid copy-on-write | 374 | ||
| Sapphire | 376 | ||
| Transactional Sapphire | 376 | ||
| Platinum | 383 | ||
| 17.5 | Concurrent compaction | 384 | |
| Compressor | 384 | ||
| Pauseless and C4 | 386 | ||
| Collie | 395 | ||
| ZGC | 396 | ||
| Shenandoah | 401 | ||
| Reducing stop-the-world time in Shenandoah and ZGC | 403 | ||
| 17.6 | Issues to consider | 404 | |
| 18 | Concurrent reference counting | 405 | |
| 18.1 | LXR: Latency-critical ImmiX with Reference counting | 405 | |
| 18.2 | Simple reference counting revisited | 409 | |
| 18.3 | Biased reference counting | 411 | |
| 18.4 | Buffered reference counting | 412 | |
| 18.5 | Concurrent, cyclic reference counting | 413 | |
| 18.6 | Taking a snapshot of the heap | 414 | |
| 18.7 | Sliding views reference counting | 416 | |
| Age-oriented collection | 416 | ||
| Sliding views cycle reclamation | 419 | ||
| Memory consistency | 420 | ||
| 18.8 | Issues to consider | 420 | |
| Safe memory reclamation | 421 | ||
| 19 | Real-time garbage collection | 423 | |
| 19.1 | Real-time systems | 423 | |
| 19.2 | Scheduling real-time collection | 424 | |
| 19.3 | Work-based real-time collection | 425 | |
| Parallel, concurrent replication | 425 | ||
| Uneven work and its impact on work-based scheduling | 432 | ||
| 19.4 | Slack-based real-time collection | 434 | |
| Scheduling the collector work | 436 | ||
| Execution overheads | 438 | ||
| Programmer input | 439 | ||
| 19.5 | Time-based real-time collection: Metronome | 439 | |
| Mutator utilisation | 439 | ||
| Supporting predictability | 441 | ||
| Analysis | 443 | ||
| Robustness | 447 | ||
| 19.6 | Combining scheduling approaches: Tax-and-Spend | 448 | |
| Tax-and-Spend scheduling | 448 | ||
| Tax-and-Spend prerequisites | 450 | ||
| 19.7 | Controlling fragmentation | 452 | |
| Incremental compaction in Metronome | 452 | ||
| Incremental replication on uniprocessors | 453 | ||
| Stopless: lock-free garbage collection | 454 | ||
| Staccato: best-effort compaction with mutator wait-freedom | 455 | ||
| Chicken: best-effort compaction with mutator wait-freedom for x86 | 458 | ||
| Clover: guaranteed compaction with probabilistic mutator lock-freedom | 458 | ||
| Stopless versus Chicken versus Clover | 459 | ||
| Fragmented allocation | 461 | ||
| 19.8 | Issues to consider | 463 | |
| 20 | Energy-aware garbage collection | 465 | |
| 20.1 | Issues to consider | 469 | |
| 21 | Persistence and garbage collection | 471 | |
| 21.1 | Persistence: concepts, issues and implementation | 471 | |
| Providing durability | 472 | ||
| Issues raised by persistence | 473 | ||
| 21.2 | Impacts of persistence on garbage collection | 475 | |
| 21.3 | Barriers in support of persistence | 476 | |
| Software barriers | 476 | ||
| Hardware barriers | 477 | ||
| Software versus hardware barriers | 478 | ||
| 21.4 | Implemented collectors for persistent heaps | 478 | |
| Persistent reference counting | 478 | ||
| Persistent mark-sweep and mark-compact | 479 | ||
| Persistent copying | 480 | ||
| 21.5 | Issues to consider | 481 | |
| Glossary | 485 | ||
| Bibliography | 503 | ||
| Index | 551 | ||