Time travel is one of the most useful Delta features. It lets you query older versions of your table without backups. This post shows a simple before/after so you can trust it in real work. Ref: Delta Time Travel.

Downloads at the end: go to Downloads.

Quick takeaways

  • Delta tables keep versions in the transaction log.
  • You can query older versions with versionAsOf or timestampAsOf.
  • Use it for audits, debugging, and rollback verification.

Run it yourself

  • Local Spark (Docker): main path for this blog.
  • Databricks Free Edition: quick alternative if you do not want Docker.
1
docker compose up

Links:


Create a small Delta table

If you already ran Delta Table 101, you can reuse the same table path. Otherwise, run the snippet below.

1
2
3
4
5
6
from pyspark.sql import functions as F

delta_path = "/tmp/delta/time_travel"

df_v1 = spark.range(0, 10_000).withColumn("status", F.lit("v1"))
df_v1.write.format("delta").mode("overwrite").save(delta_path)

Update the table (new version)

Overwrite to create a new version.

1
2
df_v2 = spark.range(0, 10_000).withColumn("status", F.lit("v2"))
df_v2.write.format("delta").mode("overwrite").save(delta_path)

Read older version

Read version 0 to compare with the latest.

1
2
3
4
5
6
7
v1 = (
    spark.read.format("delta")
         .option("versionAsOf", 0)
         .load(delta_path)
)

v1.groupBy("status").count().show()

Expected output (example):

+------+-----+
|status|count|
+------+-----+
|    v1|10000|

What to verify

  • Version 0 shows status = v1.
  • Latest version shows status = v2.
  • You can compare row counts across versions.

Notes from practice

  • Use time travel for audits, not as a permanent backup strategy.
  • If you vacuum aggressively, older versions may disappear.
  • Document the version you used when sharing results.

Downloads

If you want to run this without copying code, download the notebook or the .py export.