Skip to main content

Iceberg Metrics Reporting: How to Monitor Scan and Commit Health with Spark

· 20 min read
Cazpian Engineering
Platform Engineering Team

Iceberg Metrics Reporting: How to Monitor Scan and Commit Health with Spark

You designed the partitions correctly. You set up compaction. You even configured bloom filters. But your Iceberg tables are still slow — and you have no idea why. Is it the scan planning? Too many manifests? Delete files accumulating silently? Commit retries from writer contention? You cannot fix what you cannot see.

Apache Iceberg actually gives you everything you need to diagnose table health. The problem is that the metrics are scattered across six different layers — a Java API, virtual SQL tables, snapshot properties, file-level statistics, Puffin blobs, and engine-level instrumentation — and no one has assembled them into a single picture. This post does exactly that.

We will walk through every layer of Iceberg metrics, show you how to collect them, explain what each metric means for your read and write performance, and give you concrete thresholds and SQL queries that tell you when something is wrong and what to do about it.

Layer 1: The MetricsReporter API — ScanReport and CommitReport

Since version 1.1.0, Iceberg provides a pluggable MetricsReporter interface that captures real-time metrics for every scan and every commit. This is the most direct source of operational health data.

Detailed diagram of Iceberg metrics reporting layers showing ScanReport and CommitReport metrics, metadata table queries, snapshot summary trends, and health diagnostic thresholds

ScanReport: What Happens When You Read

Every time Spark (or any Iceberg client) plans a table scan, Iceberg emits a ScanReport containing these metrics:

MetricTypeWhat it tells you
totalPlanningDurationTimerTotal wall-clock time spent planning the scan
resultDataFilesCounterNumber of data files selected after all pruning
resultDeleteFilesCounterNumber of delete files that must be applied at read time
totalDataManifestsCounterTotal data manifests in the snapshot
scannedDataManifestsCounterData manifests actually opened and read
skippedDataManifestsCounterData manifests skipped via partition summary pruning
totalDeleteManifestsCounterTotal delete manifests in the snapshot
scannedDeleteManifestsCounterDelete manifests actually opened
skippedDeleteManifestsCounterDelete manifests skipped
totalFileSizeInBytesCounterTotal size of data files in the scan result
totalDeleteFileSizeInBytesCounterTotal size of delete files in the scan result
skippedDataFilesCounterData files eliminated via column-level min/max statistics
skippedDeleteFilesCounterDelete files eliminated
indexedDeleteFilesCounterDelete files matched via index
equalityDeleteFilesCounterEquality delete files in the scan
positionalDeleteFilesCounterPositional delete files in the scan

The ScanReport also includes context: tableName, snapshotId, schemaId, filter (the expression pushed down), projectedFieldIds, and projectedFieldNames.

CommitReport: What Happens When You Write

Every time a commit produces a new snapshot, Iceberg emits a CommitReport:

MetricTypeWhat it tells you
totalDurationTimerTotal time to complete the commit
attemptsCounterNumber of retry attempts (> 1 means conflict retries)
addedDataFilesCounterNew data files written
removedDataFilesCounterData files replaced or deleted
totalDataFilesCounterTotal data files after commit
addedDeleteFilesCounterNew delete files created
removedDeleteFilesCounterDelete files removed (e.g., by compaction)
totalDeleteFilesCounterTotal delete files after commit
addedEqualityDeleteFilesCounterNew equality delete files
removedEqualityDeleteFilesCounterEquality deletes removed
addedPositionalDeleteFilesCounterNew positional delete files
removedPositionalDeleteFilesCounterPositional deletes removed
addedRecordsCounterRows added
removedRecordsCounterRows removed
totalRecordsCounterTotal rows after commit
addedFilesSizeInBytesCounterBytes written
removedFilesSizeInBytesCounterBytes removed
totalFilesSizeInBytesCounterTotal storage after commit
addedPositionalDeletesCounterPositional delete row count added
removedPositionalDeletesCounterPositional delete rows removed
totalPositionalDeletesCounterTotal positional delete rows
addedEqualityDeletesCounterEquality delete row count added
removedEqualityDeletesCounterEquality delete rows removed
totalEqualityDeletesCounterTotal equality delete rows

The CommitReport also includes tableName, snapshotId, sequenceNumber, and operation (append, overwrite, replace, delete).

How to Enable Metrics Reporting

Option 1: Catalog configuration. Register a reporter via the metrics-reporter-impl catalog property:

spark.sql.catalog.my_catalog.metrics-reporter-impl=org.apache.iceberg.metrics.LoggingMetricsReporter

The LoggingMetricsReporter is built in and writes structured metrics to the application log. This is the simplest way to get started — no custom code required.

Option 2: Custom MetricsReporter via Java/Scala. Implement the MetricsReporter interface to send metrics anywhere — Prometheus, CloudWatch, Datadog, a Kafka topic:

import org.apache.iceberg.metrics.MetricsReport;
import org.apache.iceberg.metrics.MetricsReporter;
import org.apache.iceberg.metrics.ScanReport;
import org.apache.iceberg.metrics.CommitReport;

public class PrometheusMetricsReporter implements MetricsReporter {
@Override
public void report(MetricsReport report) {
if (report instanceof ScanReport) {
ScanReport scan = (ScanReport) report;
// Push to Prometheus
scanPlanningDuration.observe(
scan.scanMetrics().totalPlanningDuration().totalDuration().toMillis()
);
manifestSkipRatio.set(
(double) scan.scanMetrics().skippedDataManifests().value()
/ scan.scanMetrics().totalDataManifests().value()
);
deleteFilesInScan.set(
scan.scanMetrics().resultDeleteFiles().value()
);
} else if (report instanceof CommitReport) {
CommitReport commit = (CommitReport) report;
commitDuration.observe(
commit.commitMetrics().totalDuration().totalDuration().toMillis()
);
commitAttempts.set(
commit.commitMetrics().attempts().value()
);
}
}
}

Register your custom reporter:

spark.sql.catalog.my_catalog.metrics-reporter-impl=com.mycompany.PrometheusMetricsReporter

Option 3: Per-scan reporter via Java API. You can attach a reporter to an individual table scan for ad-hoc diagnostics:

TableScan scan = table.newScan()
.filter(Expressions.equal("user_id", "abc-123"))
.metricsReporter(report -> {
ScanReport scanReport = (ScanReport) report;
System.out.println("Planning took: " +
scanReport.scanMetrics().totalPlanningDuration().totalDuration());
System.out.println("Files in result: " +
scanReport.scanMetrics().resultDataFiles().value());
});

What the Output Looks Like

With LoggingMetricsReporter, a ScanReport produces log output like this:

{
"tableName": "analytics.events",
"snapshotId": 4892301847201938475,
"filter": "(user_id = 'abc-123')",
"scanMetrics": {
"totalPlanningDuration": {"totalDuration": "PT2.341S", "count": 1},
"resultDataFiles": {"value": 3, "unit": "COUNT"},
"resultDeleteFiles": {"value": 0, "unit": "COUNT"},
"totalDataManifests": {"value": 120, "unit": "COUNT"},
"scannedDataManifests": {"value": 8, "unit": "COUNT"},
"skippedDataManifests": {"value": 112, "unit": "COUNT"},
"totalFileSizeInBytes": {"value": 425829120, "unit": "BYTES"},
"skippedDataFiles": {"value": 847, "unit": "COUNT"},
"positionalDeleteFiles": {"value": 0, "unit": "COUNT"},
"equalityDeleteFiles": {"value": 0, "unit": "COUNT"}
}
}

This single report tells you: planning took 2.3 seconds, 112 of 120 manifests were skipped (93% — good), 847 data files were pruned by column stats, only 3 files remained, and no delete files were involved.

Known Limitations

There is a known issue (GitHub #11664) where CommitReport is not always emitted through custom MetricsReporters in certain catalog configurations. The RESTMetricsReporter (used by REST catalogs) handles this correctly, but catalog-level custom reporters may miss commit events in some Iceberg versions. Check your Iceberg version's release notes for fixes.

Layer 2: Metadata Tables — SQL-Queryable Table Health

Iceberg exposes virtual metadata tables that you can query with standard SQL in Spark, Trino, or Flink. These are the most practical way to assess table health because they require no code changes — just run a query.

The files Table: Data File Health

The files metadata table lists every active data file in the current snapshot:

ColumnTypeDescription
contentint0 = data file, 1 = position deletes, 2 = equality deletes
file_pathstringFull path to the file
file_formatstringPARQUET, ORC, AVRO
spec_idintPartition spec that wrote this file
record_countlongNumber of rows in the file
file_size_in_byteslongFile size
column_sizesmapPer-column size in bytes
value_countsmapPer-column non-null value count
null_value_countsmapPer-column null count
nan_value_countsmapPer-column NaN count
lower_boundsmapPer-column minimum value
upper_boundsmapPer-column maximum value
sort_order_idintSort order used when writing
readable_metricsstructHuman-readable column metrics

Detect small files (the most common Iceberg performance problem):

SELECT
partition,
COUNT(*) AS file_count,
AVG(file_size_in_bytes) / 1048576 AS avg_file_size_mb,
MIN(file_size_in_bytes) / 1048576 AS min_file_size_mb,
MAX(file_size_in_bytes) / 1048576 AS max_file_size_mb,
SUM(record_count) AS total_records
FROM my_catalog.my_db.my_table.files
GROUP BY partition
HAVING AVG(file_size_in_bytes) < 134217728 -- avg < 128 MB
ORDER BY file_count DESC;

If you see partitions with hundreds of files averaging 5-20 MB, you need compaction.

Detect file size distribution problems:

SELECT
CASE
WHEN file_size_in_bytes < 8388608 THEN 'tiny (< 8 MB)'
WHEN file_size_in_bytes < 67108864 THEN 'small (8-64 MB)'
WHEN file_size_in_bytes < 134217728 THEN 'medium (64-128 MB)'
WHEN file_size_in_bytes < 268435456 THEN 'target (128-256 MB)'
WHEN file_size_in_bytes < 536870912 THEN 'large (256-512 MB)'
ELSE 'oversized (> 512 MB)'
END AS size_bucket,
COUNT(*) AS file_count,
SUM(file_size_in_bytes) / 1073741824 AS total_gb
FROM my_catalog.my_db.my_table.files
GROUP BY 1
ORDER BY 1;

A healthy table should have most files in the 128-256 MB range (the Iceberg default target). If the majority are in the "tiny" or "small" buckets, compaction is overdue.

The manifests Table: Manifest Health

ColumnTypeDescription
pathstringManifest file path
lengthlongManifest file size in bytes
partition_spec_idintPartition spec ID
added_snapshot_idlongSnapshot that created this manifest
added_data_files_countintData files added in this manifest
existing_data_files_countintExisting (carried-over) data files
deleted_data_files_countintData files marked as deleted
added_rows_countlongRows added
existing_rows_countlongExisting rows
deleted_rows_countlongRows deleted
partition_summarieslistMin/max bounds per partition field

Detect manifest fragmentation:

SELECT
COUNT(*) AS total_manifests,
AVG(added_data_files_count + existing_data_files_count) AS avg_files_per_manifest,
AVG(length) / 1024 AS avg_manifest_size_kb
FROM my_catalog.my_db.my_table.manifests;

If you have hundreds of manifests with only a few files each, scan planning has to open too many manifest files. Run rewrite_manifests to consolidate them.

Detect manifests with only deleted entries (dead weight):

SELECT COUNT(*) AS dead_manifests
FROM my_catalog.my_db.my_table.manifests
WHERE added_data_files_count = 0
AND existing_data_files_count = 0
AND deleted_data_files_count > 0;
ColumnTypeDescription
committed_attimestampWhen the snapshot was committed
snapshot_idlongSnapshot ID
parent_idlongParent snapshot ID
operationstringappend, overwrite, replace, delete
manifest_liststringPath to the manifest list file
summarymapKey-value pairs with commit statistics

Detect snapshot accumulation:

SELECT
COUNT(*) AS total_snapshots,
MIN(committed_at) AS oldest_snapshot,
MAX(committed_at) AS newest_snapshot,
COUNT(*) / GREATEST(1,
DATEDIFF(MAX(committed_at), MIN(committed_at))
) AS snapshots_per_day
FROM my_catalog.my_db.my_table.snapshots;

If you have more than 100-200 unexpired snapshots, metadata bloat is likely slowing down scan planning. Run expire_snapshots with an appropriate retention period.

Detect high-frequency commit patterns:

SELECT
DATE(committed_at) AS commit_date,
COUNT(*) AS commits,
SUM(CASE WHEN operation = 'append' THEN 1 ELSE 0 END) AS appends,
SUM(CASE WHEN operation = 'overwrite' THEN 1 ELSE 0 END) AS overwrites,
SUM(CASE WHEN operation = 'delete' THEN 1 ELSE 0 END) AS deletes
FROM my_catalog.my_db.my_table.snapshots
GROUP BY 1
ORDER BY 1 DESC
LIMIT 30;

Tables with hundreds of appends per day (e.g., streaming micro-batches) will accumulate small files rapidly and need aggressive compaction schedules.

The partitions Table: Partition Skew

ColumnTypeDescription
partitionstructPartition values
record_countlongTotal records in partition
file_countintNumber of data files
spec_idintPartition spec ID

Detect partition skew:

SELECT
partition,
record_count,
file_count,
record_count / GREATEST(1, file_count) AS records_per_file
FROM my_catalog.my_db.my_table.partitions
ORDER BY file_count DESC
LIMIT 20;

If one partition has 500 files and most have 5, that hot partition is disproportionately impacting scan planning for queries that touch it. Consider a finer-grained partition strategy (e.g., adding bucket() on a high-cardinality column).

The all_delete_files Table: Delete File Burden

This is often overlooked, but it is critical for tables with UPDATE, DELETE, or MERGE INTO operations.

SELECT
content,
COUNT(*) AS delete_file_count,
SUM(record_count) AS total_delete_records,
SUM(file_size_in_bytes) / 1048576 AS total_size_mb
FROM my_catalog.my_db.my_table.all_delete_files
GROUP BY content;

Content 1 is positional deletes and 2 is equality deletes. If you see hundreds of delete files or millions of delete records, your reads are paying a heavy merge-on-read tax. Compaction with rewrite_data_files merges these deletes into the base data files.

Layer 3: Snapshot Summary Properties — Time-Series Table Health

Every snapshot's summary map contains cumulative statistics. By querying the summary across snapshots, you can build a time-series of table health:

SELECT
committed_at,
operation,
CAST(summary['added-data-files'] AS INT) AS added_files,
CAST(summary['removed-data-files'] AS INT) AS removed_files,
CAST(summary['total-data-files'] AS INT) AS total_files,
CAST(summary['total-delete-files'] AS INT) AS total_deletes,
CAST(summary['added-records'] AS BIGINT) AS added_records,
CAST(summary['total-records'] AS BIGINT) AS total_records,
CAST(summary['total-equality-deletes'] AS BIGINT) AS total_eq_deletes,
CAST(summary['total-position-deletes'] AS BIGINT) AS total_pos_deletes
FROM my_catalog.my_db.my_table.snapshots
ORDER BY committed_at DESC
LIMIT 50;

What to look for in the time-series

Growing total-delete-files without cleanup. If total-delete-files increases commit over commit but never decreases, delete files are accumulating. Compaction has not run or is not configured.

total-data-files growing linearly despite stable data volume. This is the classic small file accumulation pattern. Each streaming micro-batch adds a few files, but nothing consolidates them.

Frequent overwrite operations with high removed-data-files. This indicates either aggressive compaction (good) or poorly optimized MERGE INTO statements that rewrite too many files (bad). Compare removed-data-files with added-data-files — if they are roughly equal on non-compaction commits, your merge key might not be aligned with the partition key, forcing full-partition rewrites.

added-records trending down to zero while total-records stays flat. The table is receiving empty or near-empty commits — possibly from a streaming job with no new data. These empty commits still produce snapshots and manifests, adding metadata overhead for no benefit.

Layer 4: Column-Level Statistics — Clustering and Sort Effectiveness

This is the most underrated diagnostic layer. The files metadata table stores per-column lower_bounds and upper_bounds for every data file. By analyzing how these bounds overlap, you can directly measure whether your sort order is actually helping query performance.

How it works

When data is well-sorted on a column, each file contains a narrow, non-overlapping range of values for that column. Iceberg's scan planner uses these bounds to skip files that cannot contain the queried value. The tighter and more non-overlapping the bounds, the more files get skipped.

When data is unsorted, every file contains the full range of values (e.g., lower=A, upper=Z for every file). No file can be skipped based on column stats — the scan reads everything.

Measure clustering quality

SELECT
readable_metrics.user_id.lower_bound AS lower,
readable_metrics.user_id.upper_bound AS upper,
file_size_in_bytes / 1048576 AS file_size_mb,
record_count
FROM my_catalog.my_db.my_table.files
ORDER BY readable_metrics.user_id.lower_bound
LIMIT 20;

Well-clustered output (each file covers a narrow range):

lower       upper       file_size_mb  record_count
user_0001 user_1000 128 500000
user_1001 user_2000 135 520000
user_2001 user_3000 131 510000

Poorly-clustered output (every file spans the full range):

lower       upper       file_size_mb  record_count
user_0001 user_9999 128 500000
user_0002 user_9998 135 520000
user_0001 user_9999 131 510000

If your output looks like the second example, data skipping via column stats is effectively useless for that column. Run sort compaction:

CALL my_catalog.system.rewrite_data_files(
table => 'my_db.my_table',
strategy => 'sort',
sort_order => 'user_id ASC'
);

Measure overlap ratio

You can quantify the overlap by computing what fraction of files a single value would match:

SELECT
COUNT(*) AS total_files,
COUNT_IF(
readable_metrics.user_id.lower_bound <= 'user_5000'
AND readable_metrics.user_id.upper_bound >= 'user_5000'
) AS files_matching_single_value,
ROUND(
COUNT_IF(
readable_metrics.user_id.lower_bound <= 'user_5000'
AND readable_metrics.user_id.upper_bound >= 'user_5000'
) * 100.0 / COUNT(*), 1
) AS overlap_pct
FROM my_catalog.my_db.my_table.files;

For a well-sorted column, overlap_pct should be close to 0% (the value exists in 1 file out of many). For an unsorted column, it will be close to 100% (every file's range covers the value).

Layer 5: Puffin File Statistics — Advanced Query Optimization

Puffin is a file format that stores table-level statistics as binary "blobs" alongside the Iceberg metadata. These statistics go beyond file-level min/max bounds.

What Puffin stores

StatisticAlgorithmUse case
NDV (Number of Distinct Values)Apache DataSketches Theta SketchJoin order optimization, GROUP BY cost estimates
Column histogramsKLL SketchData distribution analysis, predicate selectivity estimation

Why it matters

Without NDV statistics, query engines must guess at cardinalities when planning joins and aggregations. A bad guess can lead to a broadcast join when a shuffle join was needed (or vice versa), causing out-of-memory errors or orders-of-magnitude slowdowns.

How to generate Puffin statistics

In Spark with Iceberg, you can compute statistics using the compute_statistics procedure (available in some catalog implementations):

-- AWS Glue Data Catalog generates Puffin stats automatically
-- For other catalogs, check your catalog's documentation

-- You can also generate NDV stats via Spark:
ANALYZE TABLE my_catalog.my_db.my_table
COMPUTE STATISTICS FOR COLUMNS user_id, event_type;

Note that Puffin file support varies across catalogs. AWS Glue Data Catalog generates Theta Sketch-based NDV estimates automatically. REST catalogs store Puffin references in the metadata. Check your catalog's documentation for specifics.

How to check if Puffin stats exist

Look for .stats or .puffin files in your table's metadata directory:

from pyiceberg.catalog import load_catalog

catalog = load_catalog("my_catalog")
table = catalog.load_table("my_db.my_table")

# Check for statistics files in metadata
for stat_file in table.metadata.statistics:
print(f"Stats file: {stat_file.statistics_path}")
for blob in stat_file.blob_metadata:
print(f" Blob type: {blob.type}, columns: {blob.fields}")

If no statistics files exist, your query engine is flying blind on cardinality estimates.

Layer 6: Spark Engine Metrics — Execution-Level Observability

The previous layers focus on Iceberg-native metrics. But the query engine itself provides another dimension of observability.

Spark UI metrics for Iceberg operations

When you run an Iceberg query in Spark, the SQL tab in Spark UI shows:

  • Scan node: number of files read, bytes read, rows output
  • Planning time: how long the optimizer spent (includes Iceberg scan planning)
  • Task metrics: per-task bytes read, shuffle bytes, GC time

Using EXPLAIN to see Iceberg pruning

EXPLAIN FORMATTED
SELECT * FROM my_catalog.my_db.my_table
WHERE event_date = '2026-02-10' AND user_id = 'abc-123';

The output shows the Iceberg BatchScan node with details about which partition filters were pushed down and how many files remain after pruning. If the file count is unexpectedly high, your filters are not being pushed down effectively.

Using df.explain() in PySpark

df = spark.table("my_catalog.my_db.my_table") \
.filter("event_date = '2026-02-10'") \
.filter("user_id = 'abc-123'")

df.explain(True) # Shows parsed, analyzed, optimized, and physical plans

Look for IcebergScan in the physical plan. It shows the pushed-down filters and the number of files that survived pruning.

Putting It All Together: The Diagnostic Framework

Now that you know what metrics are available, here is how to use them systematically.

Read Performance Diagnosis

SymptomMetric to checkThresholdAction
Slow scan planningScanReport totalPlanningDuration> 5 secondsrewrite_manifests, check snapshot count
Too many files scannedScanReport resultDataFilesVaries by tableImprove partitioning, sort compaction
Poor manifest pruningskippedDataManifests / totalDataManifests< 50% skippedPartition strategy does not match query patterns
Poor file pruningskippedDataFiles is lowMost files pass throughSort compaction on the filtered columns
Delete file overheadScanReport resultDeleteFiles > 0Any non-zero on readsrewrite_data_files to merge deletes into base files
Small filesfiles table: avg size < 128 MB< 128 MB averagerewrite_data_files with bin-pack or sort strategy
Partition skewpartitions table: 10x file count variance10x+ varianceRe-evaluate partition key or add bucket transform
Manifest bloatmanifests table: low files per manifest< 10 files/manifest avgrewrite_manifests
Too many snapshotssnapshots table: count > 100> 100 unexpiredexpire_snapshots with retention period
Poor data clusteringColumn lower_bounds/upper_bounds overlap> 50% overlap for point queriesSort compaction or Z-order compaction

Write Performance Diagnosis

SymptomMetric to checkThresholdAction
Slow commitsCommitReport totalDuration> 10 secondsCheck metadata file sizes, reduce snapshot count
Commit contentionCommitReport attempts> 1 regularlyReduce writer concurrency, use WAP (Write-Audit-Publish)
Write amplificationremovedDataFiles ~ addedDataFiles on non-compaction commitsRatio > 0.5MERGE INTO is rewriting too many files — align merge key with partition key
Delete file accumulationSnapshot summary total-delete-files trendGrowing without decreaseSchedule compaction after heavy UPDATE/DELETE/MERGE workloads
Empty commitsSnapshot summary added-records = 0Frequent zero-record commitsFilter empty micro-batches before commit
Storage growth anomalySnapshot summary total-files-size trendGrowing faster than data volumeCheck for orphan files, run remove_orphan_files

The Maintenance Decision Tree

Start: Is the table slow?

├── Slow READS?
│ ├── Check scan planning time (ScanReport or EXPLAIN)
│ │ ├── Planning > 5s → Too many manifests → rewrite_manifests
│ │ └── Planning OK → Pruning is the issue
│ │
│ ├── Check manifest skip ratio
│ │ ├── < 50% skipped → Wrong partition strategy
│ │ └── > 50% skipped → Check file-level pruning
│ │
│ ├── Check file skip ratio (column stats)
│ │ ├── Low → Data not clustered → sort compaction
│ │ └── High → Check for delete files
│ │
│ └── Check delete files in scan
│ ├── > 0 → rewrite_data_files to merge deletes
│ └── 0 → Check file sizes → small files → compaction

└── Slow WRITES?
├── Check commit attempts
│ ├── > 1 → Writer contention → reduce concurrency
│ └── = 1 → Check commit duration

├── Check commit duration
│ ├── > 10s → Metadata bloat → expire_snapshots
│ └── < 10s → Check write amplification

└── Check removed vs added files
├── High churn → MERGE rewriting too much → optimize merge key
└── Normal → Check target file size config

Automated Health Check Query

Here is a single comprehensive query that produces a table health scorecard:

WITH file_stats AS (
SELECT
COUNT(*) AS total_files,
AVG(file_size_in_bytes) AS avg_file_size,
MIN(file_size_in_bytes) AS min_file_size,
MAX(file_size_in_bytes) AS max_file_size,
SUM(file_size_in_bytes) AS total_size,
SUM(record_count) AS total_records,
COUNT_IF(file_size_in_bytes < 67108864) AS small_file_count,
COUNT_IF(file_size_in_bytes < 8388608) AS tiny_file_count
FROM my_catalog.my_db.my_table.files
),
manifest_stats AS (
SELECT
COUNT(*) AS total_manifests,
AVG(added_data_files_count + existing_data_files_count) AS avg_files_per_manifest
FROM my_catalog.my_db.my_table.manifests
),
snapshot_stats AS (
SELECT
COUNT(*) AS total_snapshots,
MAX(CAST(summary['total-delete-files'] AS INT)) AS current_delete_files,
MAX(CAST(summary['total-position-deletes'] AS BIGINT)) AS current_pos_deletes,
MAX(CAST(summary['total-equality-deletes'] AS BIGINT)) AS current_eq_deletes
FROM my_catalog.my_db.my_table.snapshots
),
partition_stats AS (
SELECT
COUNT(*) AS total_partitions,
MAX(file_count) AS max_files_in_partition,
MIN(file_count) AS min_files_in_partition,
AVG(file_count) AS avg_files_per_partition
FROM my_catalog.my_db.my_table.partitions
)
SELECT
-- File health
f.total_files,
ROUND(f.avg_file_size / 1048576, 1) AS avg_file_size_mb,
f.small_file_count,
f.tiny_file_count,
ROUND(f.small_file_count * 100.0 / f.total_files, 1) AS small_file_pct,
ROUND(f.total_size / 1073741824, 2) AS total_size_gb,

-- Manifest health
m.total_manifests,
ROUND(m.avg_files_per_manifest, 1) AS avg_files_per_manifest,

-- Snapshot health
s.total_snapshots,
s.current_delete_files,
s.current_pos_deletes,
s.current_eq_deletes,

-- Partition health
p.total_partitions,
p.max_files_in_partition,
p.min_files_in_partition,
ROUND(p.max_files_in_partition * 1.0 / GREATEST(1, p.min_files_in_partition), 1) AS partition_skew_ratio,

-- Health flags
CASE WHEN f.small_file_count * 100.0 / f.total_files > 30 THEN 'NEEDS COMPACTION' ELSE 'OK' END AS file_size_health,
CASE WHEN m.total_manifests > 200 THEN 'NEEDS REWRITE' ELSE 'OK' END AS manifest_health,
CASE WHEN s.total_snapshots > 100 THEN 'NEEDS EXPIRY' ELSE 'OK' END AS snapshot_health,
CASE WHEN s.current_delete_files > 50 THEN 'NEEDS COMPACTION' ELSE 'OK' END AS delete_file_health,
CASE WHEN p.max_files_in_partition * 1.0 / GREATEST(1, p.min_files_in_partition) > 10 THEN 'SKEWED' ELSE 'OK' END AS partition_health

FROM file_stats f
CROSS JOIN manifest_stats m
CROSS JOIN snapshot_stats s
CROSS JOIN partition_stats p;

This gives you a single-row output with all the key health indicators and flags for which maintenance actions are needed.

When the health check reveals multiple problems, run maintenance in this order:

-- Step 1: Expire old snapshots (reduces metadata before other operations)
CALL my_catalog.system.expire_snapshots(
table => 'my_db.my_table',
older_than => TIMESTAMP '2026-02-01 00:00:00',
retain_last => 10
);

-- Step 2: Remove orphan files (clean up unreferenced data)
CALL my_catalog.system.remove_orphan_files(
table => 'my_db.my_table',
older_than => TIMESTAMP '2026-02-01 00:00:00'
);

-- Step 3: Rewrite data files (compaction — merges small files and deletes)
CALL my_catalog.system.rewrite_data_files(
table => 'my_db.my_table',
strategy => 'sort',
sort_order => 'user_id ASC'
);

-- Step 4: Rewrite manifests (consolidate manifest files)
CALL my_catalog.system.rewrite_manifests(
table => 'my_db.my_table'
);

The order matters: expiring snapshots first removes dead metadata, making the subsequent operations faster and preventing them from processing already-expired data.

How Cazpian Handles This

On Cazpian, table health monitoring is built into the platform. Cazpian continuously collects ScanReport and CommitReport metrics from every query, surfaces them in the table health dashboard, and triggers automated maintenance when thresholds are breached. You get the diagnostic framework described in this post without writing any of the SQL or custom reporters — the platform handles the collection, analysis, and remediation automatically.

What's Next

This post covered the complete picture of Iceberg metrics — from the MetricsReporter API to metadata tables to column-level clustering analysis. For related deep dives, see our other posts: