Why Apache Iceberg and Spark for Cazpian?

December 25, 2025 · One min read

When we started building Cazpian, we had a choice of many different storage formats and compute engines. We chose Apache Iceberg and Apache Spark for three main reasons:

1. Interoperability

Iceberg represents the future of open table formats. By using Iceberg, Cazpian ensures that your data remains yours. You can query your tables from Athena, Snowflake, or Databricks without any vendor lock-in.

2. Massive Scalability

Apache Spark remains the industry standard for distributed computing. Our integrated Spark engine is optimized specifically for Iceberg, providing lightning-fast performance for even the largest datasets.

3. AWS Native Integration

By focusing exclusively on AWS, we can optimize the storage layer (S3) and the compute layer in ways that "cloud-agnostic" platforms simply cannot. This results in better performance and lower costs for our users.

1. Interoperability​

2. Massive Scalability​

3. AWS Native Integration​

1. Interoperability

2. Massive Scalability

3. AWS Native Integration