13 Comments

Note: Nessie is an open source project. But it is not an "Apache" project.

So, we should mention it as "Nessie" instead of "Apache Nessie" in all the places.

Expand full comment

Amazing article! Lot of depth. Is choosing between Apache Iceberg format or a delta table format depends on whether we use Databricks or Snowflake? Or, are there other factors to consider when laying down the architecture foundations? Might be a stupid question but I am just getting started on these formats.

Expand full comment

> For example, Snowflake can only read from an external catalog, and an external catalog can only read from a Snowflake catalog.

Did you mean: For example, Snowflake (engine) can only read from an external catalog, and an external engine can only read from a Snowflake catalog.

Great way to explain Iceberg, going to use this in my work!

Expand full comment

Ah yes indeed, thanks Maximilian for pointing out.

Expand full comment

Very interesting introduction to Apache Iceberg, thank you for this!

Expand full comment

Thanks for the shout out!

Expand full comment

And after this article, if you guys decide to use the Apache Iceberg,

Checkout my blog on "How not to use Apache Iceberg"

https://medium.com/@ajanthabhat/how-not-to-use-apache-iceberg-046ae7e7c884

Expand full comment

So its still not possible to have dml by only using files? you always need to connect to an engine for acid/dml transactions?

I know snowflake can have read/write when CREATing table with snowflake managed catalog, and how about databricks or others, we use pyspark to interact with whatever "catalog" server right? or indirectly accessing databricks via jdbc what databricks and catalog each execute then? if we doing massive 1 million row insert, i dont think databricks jdbc endpoint sending data to catalog and its catalog the one changing the iceberg data files? or its a two way collaboration ? basically who really writing data files and who modifying metadata files?

At the lowest level, when adding data to a table does engine creates files directly and then inform catalog to register/update metadata ?

Expand full comment

I'm sorry, but I still don't know, what Iceberg is... The article is very interesting, with very good examples, but it's not corresponding with header for me

Expand full comment

Thanks for your comment. What is unclear to you?

Expand full comment

Thanks for reply ) Sorry for my dumbness: Iceberg is an ideology of storing data or is it certain software? I saw "sets a standard for exposing metadata and statistics about files stored in a data lake" in the post, but I also heard that there is an iceberg format of files...

Expand full comment

Julien, thanks for your reply about open table format — can’t see it here. It’s pretty clear for me now

Expand full comment