Iceberg Metrics Reporting: How to Monitor Scan and Commit Health with Spark

March 1, 2026 · 20 min read

Platform Engineering Team

You designed the partitions correctly. You set up compaction. You even configured bloom filters. But your Iceberg tables are still slow — and you have no idea why. Is it the scan planning? Too many manifests? Delete files accumulating silently? Commit retries from writer contention? You cannot fix what you cannot see.

Apache Iceberg actually gives you everything you need to diagnose table health. The problem is that the metrics are scattered across six different layers — a Java API, virtual SQL tables, snapshot properties, file-level statistics, Puffin blobs, and engine-level instrumentation — and no one has assembled them into a single picture. This post does exactly that.

We will walk through every layer of Iceberg metrics, show you how to collect them, explain what each metric means for your read and write performance, and give you concrete thresholds and SQL queries that tell you when something is wrong and what to do about it.

Layer 1: The MetricsReporter API — ScanReport and CommitReport

Since version 1.1.0, Iceberg provides a pluggable MetricsReporter interface that captures real-time metrics for every scan and every commit. This is the most direct source of operational health data.

Detailed diagram of Iceberg metrics reporting layers showing ScanReport and CommitReport metrics, metadata table queries, snapshot summary trends, and health diagnostic thresholds

ScanReport: What Happens When You Read

Every time Spark (or any Iceberg client) plans a table scan, Iceberg emits a ScanReport containing these metrics:

Metric	Type	What it tells you
`totalPlanningDuration`	Timer	Total wall-clock time spent planning the scan
`resultDataFiles`	Counter	Number of data files selected after all pruning
`resultDeleteFiles`	Counter	Number of delete files that must be applied at read time
`totalDataManifests`	Counter	Total data manifests in the snapshot
`scannedDataManifests`	Counter	Data manifests actually opened and read
`skippedDataManifests`	Counter	Data manifests skipped via partition summary pruning
`totalDeleteManifests`	Counter	Total delete manifests in the snapshot
`scannedDeleteManifests`	Counter	Delete manifests actually opened
`skippedDeleteManifests`	Counter	Delete manifests skipped
`totalFileSizeInBytes`	Counter	Total size of data files in the scan result
`totalDeleteFileSizeInBytes`	Counter	Total size of delete files in the scan result
`skippedDataFiles`	Counter	Data files eliminated via column-level min/max statistics
`skippedDeleteFiles`	Counter	Delete files eliminated
`indexedDeleteFiles`	Counter	Delete files matched via index
`equalityDeleteFiles`	Counter	Equality delete files in the scan
`positionalDeleteFiles`	Counter	Positional delete files in the scan

The ScanReport also includes context: tableName, snapshotId, schemaId, filter (the expression pushed down), projectedFieldIds, and projectedFieldNames.

CommitReport: What Happens When You Write

Every time a commit produces a new snapshot, Iceberg emits a CommitReport:

Metric	Type	What it tells you
`totalDuration`	Timer	Total time to complete the commit
`attempts`	Counter	Number of retry attempts (> 1 means conflict retries)
`addedDataFiles`	Counter	New data files written
`removedDataFiles`	Counter	Data files replaced or deleted
`totalDataFiles`	Counter	Total data files after commit
`addedDeleteFiles`	Counter	New delete files created
`removedDeleteFiles`	Counter	Delete files removed (e.g., by compaction)
`totalDeleteFiles`	Counter	Total delete files after commit
`addedEqualityDeleteFiles`	Counter	New equality delete files
`removedEqualityDeleteFiles`	Counter	Equality deletes removed
`addedPositionalDeleteFiles`	Counter	New positional delete files
`removedPositionalDeleteFiles`	Counter	Positional deletes removed
`addedRecords`	Counter	Rows added
`removedRecords`	Counter	Rows removed
`totalRecords`	Counter	Total rows after commit
`addedFilesSizeInBytes`	Counter	Bytes written
`removedFilesSizeInBytes`	Counter	Bytes removed
`totalFilesSizeInBytes`	Counter	Total storage after commit
`addedPositionalDeletes`	Counter	Positional delete row count added
`removedPositionalDeletes`	Counter	Positional delete rows removed
`totalPositionalDeletes`	Counter	Total positional delete rows
`addedEqualityDeletes`	Counter	Equality delete row count added
`removedEqualityDeletes`	Counter	Equality delete rows removed
`totalEqualityDeletes`	Counter	Total equality delete rows

The CommitReport also includes tableName, snapshotId, sequenceNumber, and operation (append, overwrite, replace, delete).

How to Enable Metrics Reporting

Option 1: Catalog configuration. Register a reporter via the metrics-reporter-impl catalog property:

spark.sql.catalog.my_catalog.metrics-reporter-impl=org.apache.iceberg.metrics.LoggingMetricsReporter

The LoggingMetricsReporter is built in and writes structured metrics to the application log. This is the simplest way to get started — no custom code required.

Option 2: Custom MetricsReporter via Java/Scala. Implement the MetricsReporter interface to send metrics anywhere — Prometheus, CloudWatch, Datadog, a Kafka topic:

import org.apache.iceberg.metrics.MetricsReport;
import org.apache.iceberg.metrics.MetricsReporter;
import org.apache.iceberg.metrics.ScanReport;
import org.apache.iceberg.metrics.CommitReport;

public class PrometheusMetricsReporter implements MetricsReporter {
    @Override
    public void report(MetricsReport report) {
        if (report instanceof ScanReport) {
            ScanReport scan = (ScanReport) report;
            // Push to Prometheus
            scanPlanningDuration.observe(
                scan.scanMetrics().totalPlanningDuration().totalDuration().toMillis()
            );
            manifestSkipRatio.set(
                (double) scan.scanMetrics().skippedDataManifests().value()
                / scan.scanMetrics().totalDataManifests().value()
            );
            deleteFilesInScan.set(
                scan.scanMetrics().resultDeleteFiles().value()
            );
        } else if (report instanceof CommitReport) {
            CommitReport commit = (CommitReport) report;
            commitDuration.observe(
                commit.commitMetrics().totalDuration().totalDuration().toMillis()
            );
            commitAttempts.set(
                commit.commitMetrics().attempts().value()
            );
        }
    }
}

spark.sql.catalog.my_catalog.metrics-reporter-impl=com.mycompany.PrometheusMetricsReporter

Option 3: Per-scan reporter via Java API. You can attach a reporter to an individual table scan for ad-hoc diagnostics:

TableScan scan = table.newScan()
    .filter(Expressions.equal("user_id", "abc-123"))
    .metricsReporter(report -> {
        ScanReport scanReport = (ScanReport) report;
        System.out.println("Planning took: " +
            scanReport.scanMetrics().totalPlanningDuration().totalDuration());
        System.out.println("Files in result: " +
            scanReport.scanMetrics().resultDataFiles().value());
    });

What the Output Looks Like

With LoggingMetricsReporter, a ScanReport produces log output like this:

{
  "tableName": "analytics.events",
  "snapshotId": 4892301847201938475,
  "filter": "(user_id = 'abc-123')",
  "scanMetrics": {
    "totalPlanningDuration": {"totalDuration": "PT2.341S", "count": 1},
    "resultDataFiles": {"value": 3, "unit": "COUNT"},
    "resultDeleteFiles": {"value": 0, "unit": "COUNT"},
    "totalDataManifests": {"value": 120, "unit": "COUNT"},
    "scannedDataManifests": {"value": 8, "unit": "COUNT"},
    "skippedDataManifests": {"value": 112, "unit": "COUNT"},
    "totalFileSizeInBytes": {"value": 425829120, "unit": "BYTES"},
    "skippedDataFiles": {"value": 847, "unit": "COUNT"},
    "positionalDeleteFiles": {"value": 0, "unit": "COUNT"},
    "equalityDeleteFiles": {"value": 0, "unit": "COUNT"}
  }
}

This single report tells you: planning took 2.3 seconds, 112 of 120 manifests were skipped (93% — good), 847 data files were pruned by column stats, only 3 files remained, and no delete files were involved.

Known Limitations

There is a known issue (GitHub #11664) where CommitReport is not always emitted through custom MetricsReporters in certain catalog configurations. The RESTMetricsReporter (used by REST catalogs) handles this correctly, but catalog-level custom reporters may miss commit events in some Iceberg versions. Check your Iceberg version's release notes for fixes.

Layer 2: Metadata Tables — SQL-Queryable Table Health

Iceberg exposes virtual metadata tables that you can query with standard SQL in Spark, Trino, or Flink. These are the most practical way to assess table health because they require no code changes — just run a query.

The files Table: Data File Health

The files metadata table lists every active data file in the current snapshot:

Column	Type	Description
`content`	int	0 = data file, 1 = position deletes, 2 = equality deletes
`file_path`	string	Full path to the file
`file_format`	string	PARQUET, ORC, AVRO
`spec_id`	int	Partition spec that wrote this file
`record_count`	long	Number of rows in the file
`file_size_in_bytes`	long	File size
`column_sizes`	map	Per-column size in bytes
`value_counts`	map	Per-column non-null value count
`null_value_counts`	map	Per-column null count
`nan_value_counts`	map	Per-column NaN count
`lower_bounds`	map	Per-column minimum value
`upper_bounds`	map	Per-column maximum value
`sort_order_id`	int	Sort order used when writing
`readable_metrics`	struct	Human-readable column metrics

Detect small files (the most common Iceberg performance problem):

SELECT
  partition,
  COUNT(*) AS file_count,
  AVG(file_size_in_bytes) / 1048576 AS avg_file_size_mb,
  MIN(file_size_in_bytes) / 1048576 AS min_file_size_mb,
  MAX(file_size_in_bytes) / 1048576 AS max_file_size_mb,
  SUM(record_count) AS total_records
FROM my_catalog.my_db.my_table.files
GROUP BY partition
HAVING AVG(file_size_in_bytes) < 134217728  -- avg < 128 MB
ORDER BY file_count DESC;

If you see partitions with hundreds of files averaging 5-20 MB, you need compaction.

Detect file size distribution problems:

SELECT
  CASE
    WHEN file_size_in_bytes < 8388608 THEN 'tiny (< 8 MB)'
    WHEN file_size_in_bytes < 67108864 THEN 'small (8-64 MB)'
    WHEN file_size_in_bytes < 134217728 THEN 'medium (64-128 MB)'
    WHEN file_size_in_bytes < 268435456 THEN 'target (128-256 MB)'
    WHEN file_size_in_bytes < 536870912 THEN 'large (256-512 MB)'
    ELSE 'oversized (> 512 MB)'
  END AS size_bucket,
  COUNT(*) AS file_count,
  SUM(file_size_in_bytes) / 1073741824 AS total_gb
FROM my_catalog.my_db.my_table.files
GROUP BY 1
ORDER BY 1;

A healthy table should have most files in the 128-256 MB range (the Iceberg default target). If the majority are in the "tiny" or "small" buckets, compaction is overdue.

The manifests Table: Manifest Health

Column	Type	Description
`path`	string	Manifest file path
`length`	long	Manifest file size in bytes
`partition_spec_id`	int	Partition spec ID
`added_snapshot_id`	long	Snapshot that created this manifest
`added_data_files_count`	int	Data files added in this manifest
`existing_data_files_count`	int	Existing (carried-over) data files
`deleted_data_files_count`	int	Data files marked as deleted
`added_rows_count`	long	Rows added
`existing_rows_count`	long	Existing rows
`deleted_rows_count`	long	Rows deleted
`partition_summaries`	list	Min/max bounds per partition field

Detect manifest fragmentation:

SELECT
  COUNT(*) AS total_manifests,
  AVG(added_data_files_count + existing_data_files_count) AS avg_files_per_manifest,
  AVG(length) / 1024 AS avg_manifest_size_kb
FROM my_catalog.my_db.my_table.manifests;

If you have hundreds of manifests with only a few files each, scan planning has to open too many manifest files. Run rewrite_manifests to consolidate them.

Detect manifests with only deleted entries (dead weight):

SELECT COUNT(*) AS dead_manifests
FROM my_catalog.my_db.my_table.manifests
WHERE added_data_files_count = 0
  AND existing_data_files_count = 0
  AND deleted_data_files_count > 0;

The snapshots Table: Commit History and Trends

Column	Type	Description
`committed_at`	timestamp	When the snapshot was committed
`snapshot_id`	long	Snapshot ID
`parent_id`	long	Parent snapshot ID
`operation`	string	append, overwrite, replace, delete
`manifest_list`	string	Path to the manifest list file
`summary`	map	Key-value pairs with commit statistics

Detect snapshot accumulation:

SELECT
  COUNT(*) AS total_snapshots,
  MIN(committed_at) AS oldest_snapshot,
  MAX(committed_at) AS newest_snapshot,
  COUNT(*) / GREATEST(1,
    DATEDIFF(MAX(committed_at), MIN(committed_at))
  ) AS snapshots_per_day
FROM my_catalog.my_db.my_table.snapshots;

If you have more than 100-200 unexpired snapshots, metadata bloat is likely slowing down scan planning. Run expire_snapshots with an appropriate retention period.

Detect high-frequency commit patterns:

SELECT
  DATE(committed_at) AS commit_date,
  COUNT(*) AS commits,
  SUM(CASE WHEN operation = 'append' THEN 1 ELSE 0 END) AS appends,
  SUM(CASE WHEN operation = 'overwrite' THEN 1 ELSE 0 END) AS overwrites,
  SUM(CASE WHEN operation = 'delete' THEN 1 ELSE 0 END) AS deletes
FROM my_catalog.my_db.my_table.snapshots
GROUP BY 1
ORDER BY 1 DESC
LIMIT 30;

Tables with hundreds of appends per day (e.g., streaming micro-batches) will accumulate small files rapidly and need aggressive compaction schedules.

The partitions Table: Partition Skew

Column	Type	Description
`partition`	struct	Partition values
`record_count`	long	Total records in partition
`file_count`	int	Number of data files
`spec_id`	int	Partition spec ID

Detect partition skew:

SELECT
  partition,
  record_count,
  file_count,
  record_count / GREATEST(1, file_count) AS records_per_file
FROM my_catalog.my_db.my_table.partitions
ORDER BY file_count DESC
LIMIT 20;

If one partition has 500 files and most have 5, that hot partition is disproportionately impacting scan planning for queries that touch it. Consider a finer-grained partition strategy (e.g., adding bucket() on a high-cardinality column).

The all_delete_files Table: Delete File Burden

This is often overlooked, but it is critical for tables with UPDATE, DELETE, or MERGE INTO operations.

SELECT
  content,
  COUNT(*) AS delete_file_count,
  SUM(record_count) AS total_delete_records,
  SUM(file_size_in_bytes) / 1048576 AS total_size_mb
FROM my_catalog.my_db.my_table.all_delete_files
GROUP BY content;

Content 1 is positional deletes and 2 is equality deletes. If you see hundreds of delete files or millions of delete records, your reads are paying a heavy merge-on-read tax. Compaction with rewrite_data_files merges these deletes into the base data files.

Layer 3: Snapshot Summary Properties — Time-Series Table Health

Every snapshot's summary map contains cumulative statistics. By querying the summary across snapshots, you can build a time-series of table health:

SELECT
  committed_at,
  operation,
  CAST(summary['added-data-files'] AS INT) AS added_files,
  CAST(summary['removed-data-files'] AS INT) AS removed_files,
  CAST(summary['total-data-files'] AS INT) AS total_files,
  CAST(summary['total-delete-files'] AS INT) AS total_deletes,
  CAST(summary['added-records'] AS BIGINT) AS added_records,
  CAST(summary['total-records'] AS BIGINT) AS total_records,
  CAST(summary['total-equality-deletes'] AS BIGINT) AS total_eq_deletes,
  CAST(summary['total-position-deletes'] AS BIGINT) AS total_pos_deletes
FROM my_catalog.my_db.my_table.snapshots
ORDER BY committed_at DESC
LIMIT 50;

What to look for in the time-series

Growing total-delete-files without cleanup. If total-delete-files increases commit over commit but never decreases, delete files are accumulating. Compaction has not run or is not configured.

total-data-files growing linearly despite stable data volume. This is the classic small file accumulation pattern. Each streaming micro-batch adds a few files, but nothing consolidates them.

Frequent overwrite operations with high removed-data-files. This indicates either aggressive compaction (good) or poorly optimized MERGE INTO statements that rewrite too many files (bad). Compare removed-data-files with added-data-files — if they are roughly equal on non-compaction commits, your merge key might not be aligned with the partition key, forcing full-partition rewrites.

added-records trending down to zero while total-records stays flat. The table is receiving empty or near-empty commits — possibly from a streaming job with no new data. These empty commits still produce snapshots and manifests, adding metadata overhead for no benefit.

Layer 4: Column-Level Statistics — Clustering and Sort Effectiveness

This is the most underrated diagnostic layer. The files metadata table stores per-column lower_bounds and upper_bounds for every data file. By analyzing how these bounds overlap, you can directly measure whether your sort order is actually helping query performance.

How it works

When data is well-sorted on a column, each file contains a narrow, non-overlapping range of values for that column. Iceberg's scan planner uses these bounds to skip files that cannot contain the queried value. The tighter and more non-overlapping the bounds, the more files get skipped.

When data is unsorted, every file contains the full range of values (e.g., lower=A, upper=Z for every file). No file can be skipped based on column stats — the scan reads everything.

Measure clustering quality

SELECT
  readable_metrics.user_id.lower_bound AS lower,
  readable_metrics.user_id.upper_bound AS upper,
  file_size_in_bytes / 1048576 AS file_size_mb,
  record_count
FROM my_catalog.my_db.my_table.files
ORDER BY readable_metrics.user_id.lower_bound
LIMIT 20;

Well-clustered output (each file covers a narrow range):

lower       upper       file_size_mb  record_count
user_0001   user_1000   128           500000
user_1001   user_2000   135           520000
user_2001   user_3000   131           510000

Poorly-clustered output (every file spans the full range):

lower       upper       file_size_mb  record_count
user_0001   user_9999   128           500000
user_0002   user_9998   135           520000
user_0001   user_9999   131           510000

If your output looks like the second example, data skipping via column stats is effectively useless for that column. Run sort compaction:

CALL my_catalog.system.rewrite_data_files(
  table => 'my_db.my_table',
  strategy => 'sort',
  sort_order => 'user_id ASC'
);

Measure overlap ratio

You can quantify the overlap by computing what fraction of files a single value would match:

SELECT
  COUNT(*) AS total_files,
  COUNT_IF(
    readable_metrics.user_id.lower_bound <= 'user_5000'
    AND readable_metrics.user_id.upper_bound >= 'user_5000'
  ) AS files_matching_single_value,
  ROUND(
    COUNT_IF(
      readable_metrics.user_id.lower_bound <= 'user_5000'
      AND readable_metrics.user_id.upper_bound >= 'user_5000'
    ) * 100.0 / COUNT(*), 1
  ) AS overlap_pct
FROM my_catalog.my_db.my_table.files;

For a well-sorted column, overlap_pct should be close to 0% (the value exists in 1 file out of many). For an unsorted column, it will be close to 100% (every file's range covers the value).

Layer 5: Puffin File Statistics — Advanced Query Optimization

Puffin is a file format that stores table-level statistics as binary "blobs" alongside the Iceberg metadata. These statistics go beyond file-level min/max bounds.

What Puffin stores

Statistic	Algorithm	Use case
NDV (Number of Distinct Values)	Apache DataSketches Theta Sketch	Join order optimization, GROUP BY cost estimates
Column histograms	KLL Sketch	Data distribution analysis, predicate selectivity estimation

Why it matters

Without NDV statistics, query engines must guess at cardinalities when planning joins and aggregations. A bad guess can lead to a broadcast join when a shuffle join was needed (or vice versa), causing out-of-memory errors or orders-of-magnitude slowdowns.

How to generate Puffin statistics

In Spark with Iceberg, you can compute statistics using the compute_statistics procedure (available in some catalog implementations):

-- AWS Glue Data Catalog generates Puffin stats automatically
-- For other catalogs, check your catalog's documentation

-- You can also generate NDV stats via Spark:
ANALYZE TABLE my_catalog.my_db.my_table
COMPUTE STATISTICS FOR COLUMNS user_id, event_type;

Note that Puffin file support varies across catalogs. AWS Glue Data Catalog generates Theta Sketch-based NDV estimates automatically. REST catalogs store Puffin references in the metadata. Check your catalog's documentation for specifics.

How to check if Puffin stats exist

Look for .stats or .puffin files in your table's metadata directory:

from pyiceberg.catalog import load_catalog

catalog = load_catalog("my_catalog")
table = catalog.load_table("my_db.my_table")

# Check for statistics files in metadata
for stat_file in table.metadata.statistics:
    print(f"Stats file: {stat_file.statistics_path}")
    for blob in stat_file.blob_metadata:
        print(f"  Blob type: {blob.type}, columns: {blob.fields}")

If no statistics files exist, your query engine is flying blind on cardinality estimates.

Layer 6: Spark Engine Metrics — Execution-Level Observability

The previous layers focus on Iceberg-native metrics. But the query engine itself provides another dimension of observability.

Spark UI metrics for Iceberg operations

When you run an Iceberg query in Spark, the SQL tab in Spark UI shows:

Scan node: number of files read, bytes read, rows output
Planning time: how long the optimizer spent (includes Iceberg scan planning)
Task metrics: per-task bytes read, shuffle bytes, GC time

Using EXPLAIN to see Iceberg pruning

EXPLAIN FORMATTED
SELECT * FROM my_catalog.my_db.my_table
WHERE event_date = '2026-02-10' AND user_id = 'abc-123';

The output shows the Iceberg BatchScan node with details about which partition filters were pushed down and how many files remain after pruning. If the file count is unexpectedly high, your filters are not being pushed down effectively.

Using df.explain() in PySpark

df = spark.table("my_catalog.my_db.my_table") \
    .filter("event_date = '2026-02-10'") \
    .filter("user_id = 'abc-123'")

df.explain(True)  # Shows parsed, analyzed, optimized, and physical plans

Look for IcebergScan in the physical plan. It shows the pushed-down filters and the number of files that survived pruning.

Putting It All Together: The Diagnostic Framework

Now that you know what metrics are available, here is how to use them systematically.

Read Performance Diagnosis

Symptom	Metric to check	Threshold	Action
Slow scan planning	ScanReport `totalPlanningDuration`	> 5 seconds	`rewrite_manifests`, check snapshot count
Too many files scanned	ScanReport `resultDataFiles`	Varies by table	Improve partitioning, sort compaction
Poor manifest pruning	`skippedDataManifests / totalDataManifests`	< 50% skipped	Partition strategy does not match query patterns
Poor file pruning	`skippedDataFiles` is low	Most files pass through	Sort compaction on the filtered columns
Delete file overhead	ScanReport `resultDeleteFiles` > 0	Any non-zero on reads	`rewrite_data_files` to merge deletes into base files
Small files	`files` table: avg size < 128 MB	< 128 MB average	`rewrite_data_files` with bin-pack or sort strategy
Partition skew	`partitions` table: 10x file count variance	10x+ variance	Re-evaluate partition key or add bucket transform
Manifest bloat	`manifests` table: low files per manifest	< 10 files/manifest avg	`rewrite_manifests`
Too many snapshots	`snapshots` table: count > 100	> 100 unexpired	`expire_snapshots` with retention period
Poor data clustering	Column `lower_bounds/upper_bounds` overlap	> 50% overlap for point queries	Sort compaction or Z-order compaction

Write Performance Diagnosis

Symptom	Metric to check	Threshold	Action
Slow commits	CommitReport `totalDuration`	> 10 seconds	Check metadata file sizes, reduce snapshot count
Commit contention	CommitReport `attempts`	> 1 regularly	Reduce writer concurrency, use WAP (Write-Audit-Publish)
Write amplification	`removedDataFiles` ~ `addedDataFiles` on non-compaction commits	Ratio > 0.5	MERGE INTO is rewriting too many files — align merge key with partition key
Delete file accumulation	Snapshot summary `total-delete-files` trend	Growing without decrease	Schedule compaction after heavy UPDATE/DELETE/MERGE workloads
Empty commits	Snapshot summary `added-records` = 0	Frequent zero-record commits	Filter empty micro-batches before commit
Storage growth anomaly	Snapshot summary `total-files-size` trend	Growing faster than data volume	Check for orphan files, run `remove_orphan_files`

The Maintenance Decision Tree

Start: Is the table slow?
│
├── Slow READS?
│   ├── Check scan planning time (ScanReport or EXPLAIN)
│   │   ├── Planning > 5s → Too many manifests → rewrite_manifests
│   │   └── Planning OK → Pruning is the issue
│   │
│   ├── Check manifest skip ratio
│   │   ├── < 50% skipped → Wrong partition strategy
│   │   └── > 50% skipped → Check file-level pruning
│   │
│   ├── Check file skip ratio (column stats)
│   │   ├── Low → Data not clustered → sort compaction
│   │   └── High → Check for delete files
│   │
│   └── Check delete files in scan
│       ├── > 0 → rewrite_data_files to merge deletes
│       └── 0 → Check file sizes → small files → compaction
│
└── Slow WRITES?
    ├── Check commit attempts
    │   ├── > 1 → Writer contention → reduce concurrency
    │   └── = 1 → Check commit duration
    │
    ├── Check commit duration
    │   ├── > 10s → Metadata bloat → expire_snapshots
    │   └── < 10s → Check write amplification
    │
    └── Check removed vs added files
        ├── High churn → MERGE rewriting too much → optimize merge key
        └── Normal → Check target file size config

Automated Health Check Query

Here is a single comprehensive query that produces a table health scorecard:

WITH file_stats AS (
  SELECT
    COUNT(*) AS total_files,
    AVG(file_size_in_bytes) AS avg_file_size,
    MIN(file_size_in_bytes) AS min_file_size,
    MAX(file_size_in_bytes) AS max_file_size,
    SUM(file_size_in_bytes) AS total_size,
    SUM(record_count) AS total_records,
    COUNT_IF(file_size_in_bytes < 67108864) AS small_file_count,
    COUNT_IF(file_size_in_bytes < 8388608) AS tiny_file_count
  FROM my_catalog.my_db.my_table.files
),
manifest_stats AS (
  SELECT
    COUNT(*) AS total_manifests,
    AVG(added_data_files_count + existing_data_files_count) AS avg_files_per_manifest
  FROM my_catalog.my_db.my_table.manifests
),
snapshot_stats AS (
  SELECT
    COUNT(*) AS total_snapshots,
    MAX(CAST(summary['total-delete-files'] AS INT)) AS current_delete_files,
    MAX(CAST(summary['total-position-deletes'] AS BIGINT)) AS current_pos_deletes,
    MAX(CAST(summary['total-equality-deletes'] AS BIGINT)) AS current_eq_deletes
  FROM my_catalog.my_db.my_table.snapshots
),
partition_stats AS (
  SELECT
    COUNT(*) AS total_partitions,
    MAX(file_count) AS max_files_in_partition,
    MIN(file_count) AS min_files_in_partition,
    AVG(file_count) AS avg_files_per_partition
  FROM my_catalog.my_db.my_table.partitions
)
SELECT
  -- File health
  f.total_files,
  ROUND(f.avg_file_size / 1048576, 1) AS avg_file_size_mb,
  f.small_file_count,
  f.tiny_file_count,
  ROUND(f.small_file_count * 100.0 / f.total_files, 1) AS small_file_pct,
  ROUND(f.total_size / 1073741824, 2) AS total_size_gb,

  -- Manifest health
  m.total_manifests,
  ROUND(m.avg_files_per_manifest, 1) AS avg_files_per_manifest,

  -- Snapshot health
  s.total_snapshots,
  s.current_delete_files,
  s.current_pos_deletes,
  s.current_eq_deletes,

  -- Partition health
  p.total_partitions,
  p.max_files_in_partition,
  p.min_files_in_partition,
  ROUND(p.max_files_in_partition * 1.0 / GREATEST(1, p.min_files_in_partition), 1) AS partition_skew_ratio,

  -- Health flags
  CASE WHEN f.small_file_count * 100.0 / f.total_files > 30 THEN 'NEEDS COMPACTION' ELSE 'OK' END AS file_size_health,
  CASE WHEN m.total_manifests > 200 THEN 'NEEDS REWRITE' ELSE 'OK' END AS manifest_health,
  CASE WHEN s.total_snapshots > 100 THEN 'NEEDS EXPIRY' ELSE 'OK' END AS snapshot_health,
  CASE WHEN s.current_delete_files > 50 THEN 'NEEDS COMPACTION' ELSE 'OK' END AS delete_file_health,
  CASE WHEN p.max_files_in_partition * 1.0 / GREATEST(1, p.min_files_in_partition) > 10 THEN 'SKEWED' ELSE 'OK' END AS partition_health

FROM file_stats f
CROSS JOIN manifest_stats m
CROSS JOIN snapshot_stats s
CROSS JOIN partition_stats p;

This gives you a single-row output with all the key health indicators and flags for which maintenance actions are needed.

The Recommended Maintenance Order

When the health check reveals multiple problems, run maintenance in this order:

-- Step 1: Expire old snapshots (reduces metadata before other operations)
CALL my_catalog.system.expire_snapshots(
  table => 'my_db.my_table',
  older_than => TIMESTAMP '2026-02-01 00:00:00',
  retain_last => 10
);

-- Step 2: Remove orphan files (clean up unreferenced data)
CALL my_catalog.system.remove_orphan_files(
  table => 'my_db.my_table',
  older_than => TIMESTAMP '2026-02-01 00:00:00'
);

-- Step 3: Rewrite data files (compaction — merges small files and deletes)
CALL my_catalog.system.rewrite_data_files(
  table => 'my_db.my_table',
  strategy => 'sort',
  sort_order => 'user_id ASC'
);

-- Step 4: Rewrite manifests (consolidate manifest files)
CALL my_catalog.system.rewrite_manifests(
  table => 'my_db.my_table'
);

The order matters: expiring snapshots first removes dead metadata, making the subsequent operations faster and preventing them from processing already-expired data.

How Cazpian Handles This

On Cazpian, table health monitoring is built into the platform. Cazpian continuously collects ScanReport and CommitReport metrics from every query, surfaces them in the table health dashboard, and triggers automated maintenance when thresholds are breached. You get the diagnostic framework described in this post without writing any of the SQL or custom reporters — the platform handles the collection, analysis, and remediation automatically.

What's Next

This post covered the complete picture of Iceberg metrics — from the MetricsReporter API to metadata tables to column-level clustering analysis. For related deep dives, see our other posts:

Iceberg Query Performance Tuning — partition pruning, min/max statistics, and Spark read configs.
Iceberg Bloom Filters — row-group-level filtering for high-cardinality point lookups.
Iceberg Table Design — choosing partition transforms and write properties.
Writing Efficient MERGE INTO — optimizing upsert queries that create delete files.
Iceberg CDC Patterns — change data capture pipelines and maintenance strategies.

Layer 1: The MetricsReporter API — ScanReport and CommitReport​

ScanReport: What Happens When You Read​

CommitReport: What Happens When You Write​

How to Enable Metrics Reporting​

What the Output Looks Like​

Known Limitations​

Layer 2: Metadata Tables — SQL-Queryable Table Health​

The files Table: Data File Health​

The manifests Table: Manifest Health​

The snapshots Table: Commit History and Trends​

The partitions Table: Partition Skew​

The all_delete_files Table: Delete File Burden​

Layer 3: Snapshot Summary Properties — Time-Series Table Health​

What to look for in the time-series​

Layer 4: Column-Level Statistics — Clustering and Sort Effectiveness​

How it works​

Measure clustering quality​

Measure overlap ratio​

Layer 5: Puffin File Statistics — Advanced Query Optimization​

What Puffin stores​

Why it matters​

How to generate Puffin statistics​

How to check if Puffin stats exist​

Layer 6: Spark Engine Metrics — Execution-Level Observability​

Spark UI metrics for Iceberg operations​

Using EXPLAIN to see Iceberg pruning​

Using df.explain() in PySpark​

Putting It All Together: The Diagnostic Framework​

Read Performance Diagnosis​

Write Performance Diagnosis​

The Maintenance Decision Tree​

Automated Health Check Query​

The Recommended Maintenance Order​

How Cazpian Handles This​

What's Next​