Skip to main content

Time Travel in Apache Iceberg: Beyond the Basics — Auditing, Debugging, and ML Reproducibility

· 12 min read
Cazpian Engineering
Platform Engineering Team

Time Travel in Apache Iceberg: Beyond the Basics

Every Apache Iceberg overview mentions time travel. "Query your data as it existed at any point in time." It sounds impressive, gets a mention in the feature list, and then most teams never use it beyond the occasional ad-hoc debugging query.

That is a missed opportunity. Iceberg's snapshot system is not just a convenience feature — it is a production-grade capability that can replace custom auditing infrastructure, eliminate data recovery anxiety, and solve one of machine learning's hardest problems: dataset reproducibility.

This post goes beyond the basics. We will cover the snapshot architecture, the practical query patterns, branching and tagging, the Write-Audit-Publish pattern, and real-world use cases that make time travel indispensable.

How Iceberg Snapshots Work

Every write operation to an Iceberg table — insert, update, delete, merge, or compaction — creates a new snapshot. A snapshot is an immutable, complete view of the table at a specific point in time.

Detailed diagram of Iceberg time travel showing the snapshot chain, branching and tagging, the Write-Audit-Publish pattern, and how metadata.json links snapshots to manifest lists and data files

Each snapshot contains:

  • A snapshot ID (unique, monotonically increasing)
  • A timestamp of when the snapshot was committed
  • A manifest list pointing to all manifest files for that snapshot
  • An operation type (append, overwrite, replace, delete)
  • A parent snapshot ID linking it to the previous state

The critical property is immutability. Once a snapshot is created, it never changes. New writes create new snapshots. Old snapshots remain accessible until they are explicitly expired.

This means the full history of your table is available through the metadata — you do not need to maintain a separate change data capture (CDC) system, audit log, or versioning layer. The table itself is its own audit trail.

Querying the Past

Time Travel by Timestamp

The most common pattern — query the table as it existed at a specific moment:

-- What did the orders table look like yesterday at midnight?
SELECT * FROM catalog.db.orders
TIMESTAMP AS OF '2026-02-11 00:00:00';

-- What was the total revenue last week?
SELECT SUM(total_amount) FROM catalog.db.orders
TIMESTAMP AS OF '2026-02-05 12:00:00'
WHERE order_status = 'completed';

Iceberg finds the snapshot that was current at the requested timestamp and reads from that snapshot's data files. The current state of the table is completely unaffected.

Time Travel by Snapshot ID

When you need precision — query an exact snapshot, not a timestamp approximation:

-- Query a specific snapshot
SELECT * FROM catalog.db.orders
VERSION AS OF 7234891023748;

This is useful when you know the exact snapshot from a pipeline run, a model training event, or a data quality check.

Inspecting Table History

Iceberg exposes metadata tables that let you see the full history of changes:

-- View all snapshots
SELECT * FROM catalog.db.orders.snapshots;
committed_atsnapshot_idparent_idoperationmanifest_list
2026-02-12 08:00:00723489102623489101appends3://.../
2026-02-12 04:00:00623489101523489100overwrites3://.../
2026-02-11 20:00:00523489100423489099appends3://.../
-- View table history (snapshot transitions)
SELECT * FROM catalog.db.orders.history;

-- View all data files in the current snapshot
SELECT * FROM catalog.db.orders.files;

-- View all manifest files
SELECT * FROM catalog.db.orders.manifests;

These metadata tables are queryable with standard SQL — you can join them, filter them, and aggregate them just like any other table.

Rollback: Undoing Mistakes

When things go wrong — a bad data load, a faulty transformation, an accidental delete — Iceberg provides instant rollback.

Rollback to a Specific Snapshot

-- Roll back to a known good snapshot
CALL catalog.system.rollback_to_snapshot('db.orders', 523489100);

This makes the specified snapshot the current state of the table. No data is deleted — the bad snapshots still exist (until expired) and can be queried for forensic analysis.

Rollback to a Timestamp

-- Roll back to the state at a specific time
CALL catalog.system.rollback_to_timestamp(
'db.orders',
TIMESTAMP '2026-02-11 20:00:00'
);

Iceberg finds the latest snapshot at or before the specified timestamp and makes it current.

Cherry-Pick a Snapshot

Sometimes you do not want to roll back entirely — you want to apply a specific snapshot's changes to the current state:

-- Apply changes from a specific snapshot on top of current state
CALL catalog.system.cherrypick_snapshot('db.orders', 723489102);

This creates a new snapshot that includes the changes from the cherry-picked snapshot. Useful when a series of writes occurred and you want to selectively include some but not others.

Branches and Tags: Named References to Snapshots

Iceberg 1.2+ introduced branches and tags — named references to snapshots that enable more sophisticated version management.

Tags: Naming a Point in Time

A tag is a permanent, named reference to a specific snapshot. Think of it as a bookmark.

-- Tag the current state before a major data load
ALTER TABLE catalog.db.orders
CREATE TAG pre_migration AS OF VERSION 523489100
RETAIN 365 DAYS;

-- Tag end-of-quarter data for regulatory reporting
ALTER TABLE catalog.db.orders
CREATE TAG Q4_2025_close
RETAIN 2555 DAYS; -- 7 years for compliance

-- Query the tagged snapshot anytime
SELECT * FROM catalog.db.orders VERSION AS OF 'Q4_2025_close';

Key property: Tags have independent retention. Even if you expire old snapshots, a tagged snapshot is retained for its specified duration. This is critical for compliance — you can keep your normal snapshot retention at 7 days while preserving regulatory snapshots for 7 years.

Branches: Independent Lines of Development

A branch is a named, mutable reference that tracks its own lineage of snapshots. Think of it as a Git branch for your data.

-- Create a branch for ETL testing
ALTER TABLE catalog.db.orders CREATE BRANCH etl_test;

-- Write to the branch (does not affect the main table)
INSERT INTO catalog.db.orders.branch_etl_test
VALUES ('ord_999', 'user_123', 149.99, 'pending', current_timestamp());

-- Query the branch
SELECT * FROM catalog.db.orders VERSION AS OF 'etl_test';

-- Query main (unaffected by branch writes)
SELECT * FROM catalog.db.orders;

Branches enable:

  • Testing ETL changes without affecting production data
  • Data validation before publishing to the main table
  • A/B testing of data transformations
  • Write-Audit-Publish workflows (covered next)

The Write-Audit-Publish Pattern

The Write-Audit-Publish (WAP) pattern is a data quality workflow that uses branches to ensure only validated data reaches your production table. It is one of the most powerful practical applications of Iceberg's branching capability.

The Problem WAP Solves

In a typical ETL pipeline, data is written directly to the production table. If the data has quality issues — duplicates, null values in required fields, out-of-range values — the bad data is already visible to downstream consumers before anyone can validate it.

How WAP Works with Iceberg Branches

Step 1: Write — Direct the ETL pipeline to write to an audit branch instead of the main table.

-- Create an audit branch
ALTER TABLE catalog.db.orders CREATE BRANCH audit_branch;

-- Configure Spark to write to the branch
SET spark.wap.branch = audit_branch;

-- Your ETL writes go to the branch
INSERT INTO catalog.db.orders
SELECT * FROM staging.raw_orders
WHERE order_date = '2026-02-12';

The data is now in the audit_branch but not visible to anyone querying the main table.

Step 2: Audit — Run data quality checks against the branch.

-- Check for nulls in required fields
SELECT COUNT(*) FROM catalog.db.orders VERSION AS OF 'audit_branch'
WHERE order_id IS NULL OR total_amount IS NULL;

-- Check for duplicates
SELECT order_id, COUNT(*)
FROM catalog.db.orders VERSION AS OF 'audit_branch'
GROUP BY order_id
HAVING COUNT(*) > 1;

-- Check value ranges
SELECT COUNT(*) FROM catalog.db.orders VERSION AS OF 'audit_branch'
WHERE total_amount < 0 OR total_amount > 1000000;

Step 3: Publish — If validation passes, fast-forward the main branch to include the new data.

-- Publish: fast-forward main to include the audited data
CALL catalog.system.fast_forward('db.orders', 'main', 'audit_branch');

-- Clean up the audit branch
ALTER TABLE catalog.db.orders DROP BRANCH audit_branch;

If validation fails, simply drop the branch. The bad data never reaches the main table.

-- Validation failed — discard the bad data
ALTER TABLE catalog.db.orders DROP BRANCH audit_branch;
-- Main table is completely unaffected

Why WAP Matters

Without WAP, your options for data quality are:

  1. Check before writing — Limited, cannot catch issues that only appear in the context of existing data
  2. Check after writing, roll back if bad — Downstream consumers may have already read the bad data
  3. Write to a staging table, copy if good — Extra data movement, extra storage, extra complexity

WAP gives you option 4: write directly to the production table's storage, validate in isolation, and atomically publish if clean. Zero extra data movement. Zero risk to downstream consumers.

Real-World Use Cases

1. Regulatory Compliance and Auditing

Financial services, healthcare, and any regulated industry needs to answer: "What did this data look like on date X?"

-- Tag end-of-quarter snapshots for SOX compliance
ALTER TABLE catalog.db.transactions
CREATE TAG SOX_Q4_2025 AS OF VERSION 891234567
RETAIN 2555 DAYS; -- 7-year retention

-- Auditor requests: show me all transactions as of Dec 31, 2025
SELECT *
FROM catalog.db.transactions VERSION AS OF 'SOX_Q4_2025'
WHERE transaction_type = 'revenue_recognition';

What this replaces: Custom audit tables, CDC pipelines to an archive, or expensive data warehouse snapshots. With Iceberg tags, the data is already there — you just name it and set the retention.

2. Production Debugging

A dashboard shows a sudden spike in null values at 3 AM. Was it a data source issue or a pipeline bug?

-- Compare the table before and after the problematic load
SELECT
'before' AS state,
COUNT(*) AS total_rows,
COUNT(user_id) AS non_null_users,
COUNT(*) - COUNT(user_id) AS null_users
FROM catalog.db.events
TIMESTAMP AS OF '2026-02-12 02:55:00'

UNION ALL

SELECT
'after' AS state,
COUNT(*) AS total_rows,
COUNT(user_id) AS non_null_users,
COUNT(*) - COUNT(user_id) AS null_users
FROM catalog.db.events
TIMESTAMP AS OF '2026-02-12 03:05:00';

If the null count spiked, check the snapshot metadata:

-- What operation happened around 3 AM?
SELECT committed_at, snapshot_id, operation, summary
FROM catalog.db.events.snapshots
WHERE committed_at BETWEEN '2026-02-12 02:50:00' AND '2026-02-12 03:10:00';

If needed, roll back immediately:

CALL catalog.system.rollback_to_timestamp(
'db.events',
TIMESTAMP '2026-02-12 02:55:00'
);

What this replaces: Hours of log diving, pipeline re-runs, and manual data patching. With time travel, you can diagnose and fix the issue in minutes.

3. Machine Learning Reproducibility

The hardest problem in ML operations is not training the model — it is reproducing the exact dataset that was used for training.

-- Before training: tag the training dataset
ALTER TABLE catalog.db.features
CREATE TAG model_v3_training_data
RETAIN 730 DAYS; -- Keep for 2 years

-- Record the tag name in your model registry
-- model_registry.log(model_id='v3', dataset_tag='model_v3_training_data')

Six months later, when the model drifts and you need to debug:

-- Reproduce the exact training dataset
SELECT *
FROM catalog.db.features VERSION AS OF 'model_v3_training_data';

-- Compare training data distribution to current data
SELECT
feature_name,
AVG(value) AS current_avg,
STDDEV(value) AS current_stddev
FROM catalog.db.features
GROUP BY feature_name

UNION ALL

SELECT
feature_name,
AVG(value) AS training_avg,
STDDEV(value) AS training_stddev
FROM catalog.db.features VERSION AS OF 'model_v3_training_data'
GROUP BY feature_name;

What this replaces: Custom dataset versioning systems (copying training data to a separate location), DVC, or manual file tracking. The training data is just a tagged snapshot — queryable forever, at zero additional storage cost (until the underlying files are also referenced by other snapshots that expire).

4. Safe Data Migrations

Before running a risky migration — changing partition schemes, bulk updating records, merging tables — create a safety tag:

-- Safety tag before migration
ALTER TABLE catalog.db.customers
CREATE TAG pre_partition_migration
RETAIN 30 DAYS;

-- Run the migration
ALTER TABLE catalog.db.customers
REPLACE PARTITION FIELD month(created_at) WITH day(created_at);

-- Verify the migration
SELECT COUNT(*) FROM catalog.db.customers; -- Should match pre-migration

-- If something went wrong
CALL catalog.system.rollback_to_snapshot(
'db.customers',
(SELECT snapshot_id FROM catalog.db.customers.refs WHERE name = 'pre_partition_migration')
);

Snapshot Retention: Balancing History and Cost

Snapshots are not free. Each snapshot retains references to its data files, preventing them from being garbage collected. Over time, this can significantly increase storage costs.

Retention Best Practices

Use CaseRecommended RetentionApproach
Day-to-day debugging3-7 daysDefault snapshot expiry
Weekly operational review30 daysSnapshot expiry policy
Quarterly compliance7 yearsNamed tags with long retention
ML training datasets1-2 yearsNamed tags per training run
Pre-migration safety30 daysNamed tag, drop after validation

The key insight: Use short default retention (7 days via snapshot expiry) and long targeted retention (years via tags) together. This keeps storage costs low while preserving the snapshots that actually matter.

-- Short default retention
CALL catalog.system.expire_snapshots(
table => 'db.orders',
older_than => TIMESTAMP '2026-02-05 00:00:00',
retain_last => 10
);

-- But tags survive expiry
ALTER TABLE catalog.db.orders
CREATE TAG critical_snapshot RETAIN 365 DAYS;
-- This snapshot is preserved even when others are expired

Quick Reference

Time Travel Queries

-- By timestamp
SELECT * FROM table TIMESTAMP AS OF '2026-02-11 00:00:00';

-- By snapshot ID
SELECT * FROM table VERSION AS OF 7234891023748;

-- By branch
SELECT * FROM table VERSION AS OF 'branch_name';

-- By tag
SELECT * FROM table VERSION AS OF 'tag_name';

Rollback Procedures

CALL system.rollback_to_snapshot('db.table', snapshot_id);
CALL system.rollback_to_timestamp('db.table', TIMESTAMP '...');
CALL system.cherrypick_snapshot('db.table', snapshot_id);

Branch and Tag Management

-- Create
ALTER TABLE t CREATE BRANCH branch_name;
ALTER TABLE t CREATE TAG tag_name AS OF VERSION id RETAIN N DAYS;

-- Write to branch
INSERT INTO t.branch_branch_name VALUES (...);

-- Publish branch
CALL system.fast_forward('db.t', 'main', 'branch_name');

-- Drop
ALTER TABLE t DROP BRANCH branch_name;
ALTER TABLE t DROP TAG tag_name;

Need a platform that makes Iceberg's time travel, branching, and tagging seamless? Cazpian provides a fully managed Spark and Iceberg platform with Apache Polaris catalog, zero cold starts, and usage-based pricing — all running in your AWS account. Learn more.