Why Apache Iceberg and Spark for Cazpian?

When we started building Cazpian, we had a choice of many different storage formats and compute engines. We chose Apache Iceberg and Apache Spark for three main reasons:
1. Interoperability
Iceberg represents the future of open table formats. By using Iceberg, Cazpian ensures that your data remains yours. You can query your tables from Athena, Snowflake, or Databricks without any vendor lock-in.
2. Massive Scalability
Apache Spark remains the industry standard for distributed computing. Our integrated Spark engine is optimized specifically for Iceberg, providing lightning-fast performance for even the largest datasets.
3. AWS Native Integration
By focusing exclusively on AWS, we can optimize the storage layer (S3) and the compute layer in ways that "cloud-agnostic" platforms simply cannot. This results in better performance and lower costs for our users.