Your first Delta table, step by step

SeriesSpark & Delta 101

1/5. Your first Delta table, step by step2/5. PySpark basics for everyday work 3/5. Query past versions in Delta 4/5. What Delta stores on disk 5/5. Spark partitions without the pain

If you are new to Delta Lake, this is the first post to run. It focuses on the minimal actions you do in real work: create a Delta table, read it back, and overwrite it safely. Reference: Delta Lake.

Downloads at the end: go to Downloads.

Quick takeaways

Delta tables are regular files + transaction log.
You can read/write Delta like a normal table, but with reliability.
This post gives you a minimal, reproducible flow to start.

Run it yourself

Local Spark (Docker): main path for this blog.
Databricks Free Edition: quick alternative if you do not want Docker.

1
docker compose up

Links:

Minimal setup

We generate a small dataset, write it as Delta, then read it back. Ref: Spark range.

1
2
3
4
5
6
from pyspark.sql import functions as F

df = (
    spark.range(0, 100_000)
         .withColumn("group", (F.col("id") % 10).cast("int"))
)

Create the Delta table

Persist the DataFrame as Delta in a local path. Ref: DataFrameWriter.

1
2
3
delta_path = "/tmp/delta/table_101"

df.write.format("delta").mode("overwrite").save(delta_path)

Read it back

Read the same path to validate it. Ref: DataFrameReader.

1
2
read_back = spark.read.format("delta").load(delta_path)
read_back.groupBy("group").count().show()

Expected output (example):

+-----+-----+
|group|count|
+-----+-----+
|    0|10000|
|    1|10000|
...

Overwrite safely (same schema)

1
2
df_filtered = df.filter("group < 5")
df_filtered.write.format("delta").mode("overwrite").save(delta_path)

Expected output: No direct output. The count should drop after reading again.

What to verify

The table reads without errors.
Counts change after overwrite.
The folder contains a _delta_log directory.

Notes from practice

Always use format("delta") explicitly to avoid ambiguity.
Start with a local path so you can inspect files on disk.
Keep paths simple for beginners.

Downloads

If you want to run this without copying code, download the notebook or the .py export.

Download .ipynb Download .py

Quick takeaways#

Run it yourself#

Minimal setup#

Create the Delta table#

Read it back#

Overwrite safely (same schema)#

What to verify#

Notes from practice#

Downloads#