Spark Memory Architecture: The Complete Guide to the Unified Memory Model

March 7, 2026 · 17 min read

Platform Engineering Team

A Spark job fails with OutOfMemoryError. The team doubles spark.executor.memory from 8 GB to 16 GB. The job passes. Nobody investigates why. Three months later, the same job fails again on a larger dataset. They double it again to 32 GB. The cluster bill doubles with it.

This is the most common pattern in Spark operations — and the most expensive. Teams treat memory as a single knob to turn up. They do not know that Spark splits memory into distinct regions with different purposes, different eviction rules, and different failure modes. They do not know that their 16 GB executor only gives Spark 9.42 GB of usable unified memory. They do not know that their driver OOM has nothing to do with executor memory. They do not know that half their container memory is overhead they never configured.

This post explains exactly how Spark manages memory. We cover the unified memory model with exact formulas, the difference between execution and storage memory, driver vs executor memory architecture, off-heap memory with Tungsten, container memory for YARN and Kubernetes, PySpark-specific memory, and how to calculate every region from your configuration. The companion post covers OOM debugging, GC tuning, and observability.

The Unified Memory Model

Before Spark 1.6, memory management used a static model with fixed, non-flexible boundaries between execution and storage. If execution memory was full but storage was empty, execution could not use the idle storage memory — it would simply spill to disk. This was wasteful.

Detailed diagram of Spark's unified memory model showing JVM heap regions, execution vs storage memory with dynamic borrowing, off-heap Tungsten memory, driver vs executor layout, and container memory formulas

Spark 1.6 introduced the Unified Memory Manager (SPARK-10000), which replaced the static model with a dynamic one. Execution and storage share a single pool and can borrow from each other.

The Four Memory Regions

Every Spark executor's JVM heap is divided into four regions:

┌─────────────────────────────────────────────────────┐
│                  JVM Heap (spark.executor.memory)    │
│                                                     │
│  ┌───────────────┐                                  │
│  │  Reserved      │  300 MB (hardcoded)              │
│  │  Memory        │  Internal Spark structures       │
│  └───────────────┘                                  │
│                                                     │
│  ┌───────────────┐                                  │
│  │  User          │  (heap - 300MB) × (1 - 0.6)     │
│  │  Memory        │  Your objects, UDFs, metadata    │
│  └───────────────┘                                  │
│                                                     │
│  ┌─────────────────────────────────────────────┐    │
│  │  Unified Memory  (heap - 300MB) × 0.6        │    │
│  │                                               │    │
│  │  ┌───────────────────┐ ┌───────────────────┐ │    │
│  │  │  Storage Memory   │ │ Execution Memory  │ │    │
│  │  │  (initial 50%)    │ │ (initial 50%)     │ │    │
│  │  │                   │ │                   │ │    │
│  │  │  Cached DFs/RDDs  │ │ Shuffle buffers   │ │    │
│  │  │  Broadcast vars   │ │ Sort buffers      │ │    │
│  │  │  Task results     │ │ Hash tables       │ │    │
│  │  │                   │ │ Aggregation maps  │ │    │
│  │  └───────────────────┘ └───────────────────┘ │    │
│  │         ← dynamic boundary →                  │    │
│  └─────────────────────────────────────────────┘    │
│                                                     │
└─────────────────────────────────────────────────────┘

Region 1: Reserved Memory (300 MB)

A hardcoded 300 MB set aside for Spark's internal structures — logging, metrics, internal bookkeeping. This is not configurable in production. There is a spark.testing.reservedMemory parameter but it is explicitly for testing only and should never be changed.

Implication: Your executor must have at least 300 MB + some usable space. In practice, never set spark.executor.memory below 1 GB.

Region 2: User Memory

User Memory = (spark.executor.memory - 300 MB) × (1 - spark.memory.fraction)

With default spark.memory.fraction=0.6:

User Memory = (spark.executor.memory - 300 MB) × 0.4

User memory holds:

Your custom data structures (HashMap, ArrayList, etc.)
Internal metadata for data processing
UDF state and closures
RDD dependency information
Spark internal objects not managed by the memory manager

This region is not managed by Spark's unified memory manager. If you allocate too many objects here, you get a standard java.lang.OutOfMemoryError: Java heap space with no help from Spark's eviction or spill mechanisms.

Region 3: Unified Memory — Storage

Protected Storage = (spark.executor.memory - 300 MB) × spark.memory.fraction × spark.memory.storageFraction

With defaults:

Protected Storage = (spark.executor.memory - 300 MB) × 0.6 × 0.5
                  = (spark.executor.memory - 300 MB) × 0.3

Storage memory holds:

Cached DataFrames and RDDs — data stored via .cache() or .persist()
Broadcast variables — hash tables and data sent to all executors
Unroll memory — space used when deserializing blocks for caching

The protected storage portion (defined by spark.memory.storageFraction) is immune to eviction by execution. Storage above this protected floor can be evicted when execution needs space.

Region 4: Unified Memory — Execution

Initial Execution = (spark.executor.memory - 300 MB) × spark.memory.fraction × (1 - spark.memory.storageFraction)

With defaults:

Initial Execution = (spark.executor.memory - 300 MB) × 0.6 × 0.5
                  = (spark.executor.memory - 300 MB) × 0.3

Execution memory holds:

Shuffle buffers — data being serialized and written during shuffle
Sort buffers — UnsafeExternalSorter pages for SortMergeJoin and sort operations
Hash tables — HashedRelation for BroadcastHashJoin and ShuffledHashJoin
Aggregation maps — hash maps for GROUP BY operations

Execution memory has priority over storage. If execution needs more space and storage is occupying it, storage blocks are evicted (using LRU). The reverse is not true — execution memory borrowed by storage is never forcibly reclaimed.

The unified memory model's key feature is that the boundary between execution and storage is soft. Both regions can expand into the other's space when it is idle.

Borrowing Rules

Storage borrows from execution: When storage needs more memory and execution has unused capacity, storage expands into execution's region. However, if execution later needs that memory back, storage must give it up — cached blocks are evicted using LRU (Least Recently Used).

Execution borrows from storage: When execution needs more memory and storage has unused capacity, execution expands into storage's region. If storage later needs memory, execution does not give it back. Storage must wait for execution to release naturally or spill cached data to disk.

Why Execution Has Priority

Execution memory holds intermediate computation state — shuffle buffers, sort arrays, hash tables. If execution runs out of memory mid-operation, the task fails with SparkOutOfMemoryError. Recovering from this requires restarting the entire task.

Storage memory holds cached data. If cached data is evicted, it can be recomputed from the lineage (RDD) or re-read from disk (MEMORY_AND_DISK). Eviction is inconvenient but not fatal.

This asymmetry is intentional: losing execution state is worse than losing cached data.

The Protected Storage Floor

The spark.memory.storageFraction (default 0.5) defines the protected floor below which execution cannot evict storage. Even under extreme execution pressure, Spark guarantees that storage retains at least this fraction of the unified region.

Protected Floor = Unified Memory × spark.memory.storageFraction
                = (spark.executor.memory - 300 MB) × 0.6 × 0.5

Anything cached above this floor is fair game for eviction. Anything below it is safe.

Exact Memory Calculations

Example: 16 GB Executor

spark.executor.memory = 16 GB (16,384 MB)
spark.memory.fraction = 0.6
spark.memory.storageFraction = 0.5

Region	Formula	Size
Reserved	Hardcoded	300 MB
Usable Heap	16,384 - 300	16,084 MB
User Memory	16,084 × 0.4	6,434 MB
Unified Memory	16,084 × 0.6	9,650 MB
→ Storage (initial/protected)	9,650 × 0.5	4,825 MB
→ Execution (initial)	9,650 × 0.5	4,825 MB

Key insight: Of your 16 GB executor, Spark only has 9.65 GB of managed unified memory. The rest is reserved (300 MB) and user space (6.4 GB). When you see "executor memory" in configurations, the usable Spark-managed portion is only 60% of the heap minus 300 MB.

Example: 4 GB Executor

spark.executor.memory = 4 GB (4,096 MB)

Region	Size
Reserved	300 MB
User Memory	(4,096 - 300) × 0.4 = 1,518 MB
Unified Memory	(4,096 - 300) × 0.6 = 2,278 MB
→ Storage (protected)	1,139 MB
→ Execution (initial)	1,139 MB

Only 2.28 GB of managed memory from a 4 GB executor. This is why small executors struggle with large shuffles and caching.

Example: 32 GB Executor

spark.executor.memory = 32 GB (32,768 MB)

Region	Size
Reserved	300 MB
User Memory	(32,768 - 300) × 0.4 = 12,987 MB
Unified Memory	(32,768 - 300) × 0.6 = 19,481 MB
→ Storage (protected)	9,740 MB
→ Execution (initial)	9,740 MB

19.5 GB of managed memory. This is where broadcast joins and large caches start to become comfortable.

Driver Memory Architecture

The driver is fundamentally different from executors. It coordinates the application but does not process data partitions. Its memory layout serves a different purpose.

What the Driver Holds

SparkContext — the entry point object for all Spark operations
DAGScheduler and TaskScheduler — the scheduling infrastructure for all stages and tasks
Broadcast variables — the driver builds and holds the full copy before broadcasting
Collect results — any data pulled back via .collect(), .toPandas(), .take(), .show()
Accumulator values — aggregated from all executors
Query plans — for complex queries, the catalyst plan tree itself can be large

Driver Memory Configuration

spark.driver.memory = 1g (default)

The driver's JVM heap. Controls how much data the driver can hold in memory.

spark.driver.memoryOverhead = max(driverMemory × 0.10, 384 MB)

Off-heap memory for the driver process — NIO direct buffers, native memory, JVM internal overhead.

spark.driver.maxResultSize = 1g (default)

Maximum total size of serialized results from all partitions for a single action (like collect()). If a result exceeds this, the job is aborted. This is a safety net — not the memory allocation. Setting this to 0 disables the limit, which is dangerous.

Common Driver OOM Scenarios

1. Large collect():

# This pulls ALL data to the driver — OOM if df is large
all_data = df.collect()

# This too — the entire DataFrame becomes a Pandas DataFrame in driver memory
pandas_df = df.toPandas()

2. Large broadcast:

The driver must hold the entire broadcast table in memory while building the HashedRelation. A 500 MB Parquet table can expand to 2+ GB in memory.

3. Too many accumulator values:

Each task sends accumulator updates back to the driver. With millions of tasks and custom accumulators, the driver accumulates significant state.

4. Complex query plans:

Queries with hundreds of columns, deep subquery nesting, or many unions can create large catalyst plan trees that consume driver heap.

Right-Sizing the Driver

For most workloads:

spark.driver.memory = 2g to 8g

If your workflow includes collect(), toPandas(), or broadcast of tables > 100 MB:

spark.driver.memory = 8g to 16g

If you do not use any of these and the driver is purely a coordinator:

spark.driver.memory = 2g to 4g

Executor Memory Architecture

Each executor runs tasks that process data partitions. Executor memory is where the actual computation happens.

The Full Executor Memory Layout

┌─────────────────────────────────────────────────────────┐
│                   Container Memory                       │
│                                                          │
│  ┌──────────────────────────────────────────────┐       │
│  │           JVM Heap (spark.executor.memory)    │       │
│  │                                               │       │
│  │  Reserved (300 MB)                            │       │
│  │  User Memory (heap - 300) × 0.4              │       │
│  │  Unified Memory (heap - 300) × 0.6           │       │
│  │    - Storage (cached data, broadcasts)        │       │
│  │    - Execution (shuffles, sorts, joins)       │       │
│  └──────────────────────────────────────────────┘       │
│                                                          │
│  ┌──────────────────────────────────────────────┐       │
│  │  Memory Overhead (spark.executor.memoryOverhead)│     │
│  │                                               │       │
│  │  NIO Direct Buffers (Netty)                   │       │
│  │  Thread Stacks                                │       │
│  │  JVM Internal Structures                      │       │
│  │  Native Memory (JNI)                          │       │
│  │  Compressed Class Space / Metaspace           │       │
│  └──────────────────────────────────────────────┘       │
│                                                          │
│  ┌──────────────────────────────────────────────┐       │
│  │  Off-Heap Memory (spark.memory.offHeap.size)  │       │
│  │  (only if spark.memory.offHeap.enabled=true)  │       │
│  │                                               │       │
│  │  Tungsten managed memory                      │       │
│  │  Execution: shuffles, sorts, hash tables      │       │
│  │  Storage: off-heap columnar cache             │       │
│  └──────────────────────────────────────────────┘       │
│                                                          │
│  ┌──────────────────────────────────────────────┐       │
│  │  PySpark Memory (spark.executor.pyspark.memory)│      │
│  │  (only for PySpark workloads)                 │       │
│  │                                               │       │
│  │  Python worker process memory                 │       │
│  │  Pandas DataFrames, NumPy arrays              │       │
│  │  Arrow serialization buffers                  │       │
│  └──────────────────────────────────────────────┘       │
│                                                          │
└─────────────────────────────────────────────────────────┘

Container Memory Formula

Total Container Memory = spark.executor.memory
                       + spark.executor.memoryOverhead
                       + spark.memory.offHeap.size (if enabled)
                       + spark.executor.pyspark.memory (if set)

This total is what YARN or Kubernetes allocates for the container/pod. If the process exceeds this, the container is killed.

Memory Overhead Defaults

The overhead default depends on the deployment platform:

Platform	Default Formula	Minimum
YARN	`executorMemory × 0.07`	384 MB
Kubernetes (JVM)	`executorMemory × 0.10`	384 MB
Kubernetes (non-JVM)	`executorMemory × 0.40`	384 MB

What needs overhead memory:

Netty NIO buffers — shuffle data transfer uses direct byte buffers outside JVM heap
Thread stacks — each thread needs 512 KB–1 MB of native stack space
JVM internal structures — class metadata (Metaspace), JIT compiled code, internal tables
JNI native memory — any native libraries loaded by Spark or your code
Compressed oops — JVM pointer compression overhead

Per-Task Memory

Each executor runs spark.executor.cores (default 1) tasks concurrently. Each task shares the executor's unified memory pool, managed by TaskMemoryManager.

Available memory per task ≈ Unified Memory / spark.executor.cores

With a 16 GB executor and 4 cores:

Per-task unified memory ≈ 9,650 MB / 4 = 2,412 MB

This is why increasing cores without increasing memory can cause OOM — each task gets less memory.

Off-Heap Memory

Off-heap memory stores data outside the JVM heap, in native memory managed by Spark's Tungsten engine. The primary benefit: zero GC overhead. Data in off-heap is invisible to the garbage collector.

Configuration

spark.memory.offHeap.enabled = false (default)
spark.memory.offHeap.size = 0 (default, in bytes)

When enabled, Spark creates a second unified memory pool in off-heap space, with the same execution/storage split. The on-heap unified pool still exists — both pools are used.

How Tungsten Uses Off-Heap

Tungsten is Spark's memory management engine that operates on raw binary data:

Uses sun.misc.Unsafe (or java.lang.foreign.MemorySegment in newer JDKs) for direct memory access
Allocates memory in pages (default 16 MB page size)
Stores data in compact binary format — no Java object headers, no boxing
Enables cache-friendly data layouts for better CPU performance

Operations that benefit from off-heap:

Shuffle sort and merge
Hash table construction for joins and aggregations
Columnar cache storage (spark.sql.columnInMemory.offHeap.enabled)

When to Use Off-Heap

Use off-heap when:

GC pauses are causing performance problems (> 10% of task time)
You have large shuffles or hash tables that create GC pressure
You need predictable latency without GC spikes

Avoid off-heap when:

Your workload is not GC-bound
You cannot increase container memory limits (off-heap adds to total memory)
You are debugging memory issues (off-heap is harder to inspect)

Container Memory Impact

Off-heap memory is not inside the JVM heap. It must be accounted for in the container's total memory:

Container Memory = spark.executor.memory + memoryOverhead + spark.memory.offHeap.size

If you enable 4 GB of off-heap but do not increase the container limit, the process will be killed for exceeding memory.

Example configuration:

spark.executor.memory = 12g
spark.executor.memoryOverhead = 2g
spark.memory.offHeap.enabled = true
spark.memory.offHeap.size = 4g

Total container memory: 12 + 2 + 4 = 18 GB.

Container Memory: YARN and Kubernetes

Understanding container memory is critical because the most confusing OOM errors come from the container being killed — not from the JVM running out of heap.

YARN Container Memory

YARN Container Memory = spark.executor.memory
                      + spark.yarn.executor.memoryOverhead
                      + spark.memory.offHeap.size
                      + spark.executor.pyspark.memory

YARN overhead default:

spark.yarn.executor.memoryOverhead = max(spark.executor.memory × 0.07, 384 MB)

Example: 20 GB executor on YARN:

JVM Heap:   20 GB
Overhead:   max(20 × 0.07, 0.384) = 1.4 GB
Container:  21.4 GB

The YARN NodeManager checks physical RSS memory of the container. If it exceeds the container limit, YARN kills the container immediately with no Java stack trace — just a log message: "Container killed by YARN for exceeding memory limits." The exit code is 137.

Kubernetes Pod Memory

K8s Pod Memory = spark.executor.memory
               + spark.kubernetes.executor.memoryOverhead
               + spark.memory.offHeap.size
               + spark.executor.pyspark.memory

Kubernetes overhead default:

spark.kubernetes.memoryOverheadFactor = 0.10 (JVM) or 0.40 (non-JVM like PySpark)
spark.kubernetes.executor.memoryOverhead = max(spark.executor.memory × factor, 384 MB)

Example: 16 GB executor on Kubernetes (PySpark):

JVM Heap:    16 GB
Overhead:    max(16 × 0.10, 0.384) = 1.6 GB
PySpark:     4 GB
Pod Memory:  21.6 GB

When a pod exceeds its memory limit, Kubernetes sends OOMKilled and the pod is terminated. You see this in kubectl describe pod output.

Why Container Kills Are Confusing

Container kills produce no Java stack trace. The process is killed by the operating system or container runtime, not by the JVM. This makes them the hardest OOM errors to diagnose because:

No OutOfMemoryError in application logs
The Spark UI may not capture the failure
The error message only says "exceeded memory limits" without telling you what consumed the memory
The root cause is usually off-heap memory (NIO buffers, native memory) that is not visible in JVM heap metrics

Right-Sizing Overhead

Workload Type	Recommended Overhead
Scala/Java, light shuffle	Default (7-10%)
Heavy shuffle, large broadcast	15-20%
PySpark with UDFs	20-30%
PySpark with pandas_udf + Arrow	25-40%
Off-heap enabled	Add full `offHeap.size` separately

Rule of thumb: Start with default overhead. If you see container kills but no JVM OOM in logs, increase overhead by 50% and monitor again.

PySpark Memory Architecture

PySpark adds a separate Python process alongside the JVM executor, creating a dual-memory architecture.

How PySpark Memory Works

┌─────────────────────────────────────┐
│  Container                          │
│                                     │
│  ┌─────────────┐  ┌──────────────┐ │
│  │ JVM Executor │  │ Python Worker│ │
│  │              │  │              │ │
│  │ Spark Core   │←→│ PySpark API  │ │
│  │ Catalyst     │  │ Pandas UDFs  │ │
│  │ Tungsten     │  │ NumPy/SciPy  │ │
│  │              │  │              │ │
│  │ executor.    │  │ pyspark.     │ │
│  │ memory       │  │ memory       │ │
│  └─────────────┘  └──────────────┘ │
│                                     │
│  + memoryOverhead (shared)          │
└─────────────────────────────────────┘

Key PySpark Memory Configurations

spark.executor.pyspark.memory    -- not set by default

Limits the Python worker process memory. When set, Spark configures RLIMIT_AS to restrict the Python process. Without this, the Python worker can consume unbounded memory.

spark.python.worker.memory       -- not set by default

Controls when Spark spills Python objects to disk in the JVM-side bridge. This is for RDD operations through the Py4J bridge.

spark.sql.execution.arrow.pyspark.enabled = false (default)

Enables Apache Arrow for efficient data transfer between JVM and Python. With Arrow, data is transferred as columnar batches instead of row-by-row serialization — 10-100x faster.

spark.sql.execution.arrow.maxRecordsPerBatch = 10000

Maximum rows per Arrow record batch. Larger batches are more efficient but use more memory per batch.

PySpark Memory Traps

1. toPandas() pulls everything to driver:

# This collects ALL data to the driver, then converts to Pandas
# Both the Spark collect AND the Pandas DataFrame must fit in driver memory
pandas_df = spark_df.toPandas()

# For a 10 GB DataFrame, you need ~20+ GB of driver memory

2. pandas_udf loads entire partitions:

@pandas_udf("double")
def my_func(v: pd.Series) -> pd.Series:
    return v * 2  # Entire partition loaded as Pandas Series

If a partition has 50 million rows, the entire partition is materialized as a Pandas DataFrame in the Python worker. Increase partition count to reduce per-partition size.

3. Arrow serialization overhead:

Arrow transfers between JVM and Python create temporary copies. With large batches, this can double the effective memory usage temporarily.

Recommended PySpark Memory Configuration

spark.executor.memory = 8g
spark.executor.memoryOverhead = 2g
spark.executor.pyspark.memory = 4g
spark.sql.execution.arrow.pyspark.enabled = true
spark.sql.execution.arrow.maxRecordsPerBatch = 10000

Total container: 8 + 2 + 4 = 14 GB.

Memory Configuration Reference

Core Memory

Configuration	Default	Description
`spark.executor.memory`	`1g`	JVM heap per executor
`spark.driver.memory`	`1g`	JVM heap for driver
`spark.driver.maxResultSize`	`1g`	Max serialized result size per action
`spark.executor.cores`	`1`	Concurrent tasks per executor (affects per-task memory)

Unified Memory Model

Configuration	Default	Description
`spark.memory.fraction`	`0.6`	Fraction of (heap - 300 MB) for unified pool
`spark.memory.storageFraction`	`0.5`	Protected storage floor within unified pool

Off-Heap

Configuration	Default	Description
`spark.memory.offHeap.enabled`	`false`	Enable off-heap memory
`spark.memory.offHeap.size`	`0`	Off-heap allocation in bytes

Memory Overhead

Configuration	Default	Description
`spark.executor.memoryOverhead`	`max(memory × 0.10, 384 MB)`	Executor off-heap overhead
`spark.driver.memoryOverhead`	`max(memory × 0.10, 384 MB)`	Driver off-heap overhead
`spark.yarn.executor.memoryOverhead`	`max(memory × 0.07, 384 MB)`	YARN-specific override
`spark.kubernetes.memoryOverheadFactor`	`0.10` (JVM), `0.40` (non-JVM)	K8s overhead factor

PySpark

Configuration	Default	Description
`spark.executor.pyspark.memory`	not set	Python worker memory limit
`spark.python.worker.memory`	not set	Py4J spill threshold
`spark.sql.execution.arrow.pyspark.enabled`	`false`	Enable Arrow for PySpark
`spark.sql.execution.arrow.maxRecordsPerBatch`	`10000`	Rows per Arrow batch

Serialization (Affects Memory Usage)

Configuration	Default	Description
`spark.serializer`	`JavaSerializer`	Serializer class (Kryo is 2-10x more compact)
`spark.kryoserializer.buffer`	`64k`	Initial Kryo buffer
`spark.kryoserializer.buffer.max`	`64m`	Max Kryo buffer

Shuffle and Execution (Affects Memory Usage)

Configuration	Default	Description
`spark.sql.shuffle.partitions`	`200`	Partitions per shuffle (more = less data per task)
`spark.shuffle.file.buffer`	`32k`	Shuffle file output buffer
`spark.reducer.maxSizeInFlight`	`48m`	Max shuffle data in flight per reduce task
`spark.sql.files.maxPartitionBytes`	`128 MB`	Max bytes per partition when reading files

How Cazpian Uses Memory Architecture

Cazpian's compute pools are configured with production-tested memory defaults for Iceberg workloads on S3:

Unified memory sizing is tuned for the mix of execution (shuffle-heavy joins) and storage (cached dimension tables) typical in lakehouse queries
Container memory accounts for off-heap overhead from Netty shuffle and Arrow serialization
PySpark memory is pre-configured when Python runtimes are selected
Per-query memory tracking in the SQL editor shows exactly how much memory each query consumed — unified, execution, storage, spill — so teams can right-size without guesswork

The companion post — Spark OOM Debugging: The Complete Guide — covers how to diagnose and fix every type of OOM error using Spark UI, GC analysis, and observability tools.

Summary

Spark's unified memory model is elegant but complex. The key takeaways:

Only 60% of your heap is managed by Spark. The rest is reserved (300 MB) and user memory. A 16 GB executor has ~9.65 GB of unified memory
Execution has priority over storage. Execution can evict cached data, but storage cannot evict execution buffers. This is by design — computation failures are more expensive than cache misses
The boundary between execution and storage is dynamic. They borrow from each other, but only storage can be forcibly evicted
Container memory = heap + overhead + off-heap + PySpark. Container kills (exit 137, OOMKilled) happen when the total exceeds the container limit — not when the JVM heap is full
Driver and executor OOM are completely different problems. Driver OOM is usually caused by collect(), broadcast, or accumulators. Executor OOM is caused by large shuffles, skewed partitions, or insufficient memory per task
PySpark doubles the memory challenge. A separate Python process runs alongside the JVM, and both need memory within the same container

The next time you see an OOM error, do not double the memory. Calculate which region overflowed, understand what filled it, and fix the root cause.

The Unified Memory Model​

The Four Memory Regions​

Region 1: Reserved Memory (300 MB)​

Region 2: User Memory​

Region 3: Unified Memory — Storage​

Region 4: Unified Memory — Execution​

How Dynamic Memory Sharing Works​

Borrowing Rules​

Why Execution Has Priority​

The Protected Storage Floor​

Exact Memory Calculations​

Example: 16 GB Executor​

Example: 4 GB Executor​

Example: 32 GB Executor​

Driver Memory Architecture​

What the Driver Holds​

Driver Memory Configuration​

Common Driver OOM Scenarios​

Right-Sizing the Driver​

Executor Memory Architecture​

The Full Executor Memory Layout​

Container Memory Formula​

Memory Overhead Defaults​

Per-Task Memory​

Off-Heap Memory​

Configuration​

How Tungsten Uses Off-Heap​

When to Use Off-Heap​

Container Memory Impact​

Container Memory: YARN and Kubernetes​

YARN Container Memory​

Kubernetes Pod Memory​

Why Container Kills Are Confusing​

Right-Sizing Overhead​

PySpark Memory Architecture​

How PySpark Memory Works​

Key PySpark Memory Configurations​

PySpark Memory Traps​

Recommended PySpark Memory Configuration​

Memory Configuration Reference​

Core Memory​

Unified Memory Model​

Off-Heap​

Memory Overhead​

PySpark​

Serialization (Affects Memory Usage)​

Shuffle and Execution (Affects Memory Usage)​

How Cazpian Uses Memory Architecture​

Summary​