

Apache Iceberg is an open desk format for datasets that can be utilized with compute engines like Spark, Trino, PrestoDB, Flink, and Hive.
It has a whole lot of failsafes in place to make sure that customers don’t unintentionally mess up a desk with a unsuitable command.
Its schema evolution helps duties like add, drop, replace, or rename, and received’t inadvertently un-delete knowledge. It additionally has hidden partitioning, which helps to stop silently incorrect outcomes or sluggish queries that may end result from person errors.
Different person expertise capabilities embody partition structure evolution that may replace the desk structure as question patterns change, the power to make queries reproducible, and model rollback to permit customers to return to a earlier state of the desk.
It was designed for giant tables with the intention {that a} distributed SQL engine wouldn’t be wanted to learn it.
It additionally works with any cloud retailer, has serializable isolation, and may help a number of concurrent writers.
“It’s shortly changing into that trade customary for a way tables are represented in programs like S3 and object storage,” stated Tomer Shiran, founder and chief product officer at cloud knowledge lake firm Dremio, which is a contributor to the venture.