SeriesSpark & Delta 101
Time travel is one of the most useful Delta features. It lets you query older versions of your table without backups. This post shows a simple before/after so you can trust it in real work. Ref: Delta Time Travel.
Downloads at the end: go to Downloads.
Quick takeaways
- Delta tables keep versions in the transaction log.
- You can query older versions with
versionAsOfortimestampAsOf. - Use it for audits, debugging, and rollback verification.
Run it yourself
- Local Spark (Docker): main path for this blog.
- Databricks Free Edition: quick alternative if you do not want Docker.
| |
Links:
Create a small Delta table
If you already ran Delta Table 101, you can reuse the same table path. Otherwise, run the snippet below.
| |
Update the table (new version)
Overwrite to create a new version.
| |
Read older version
Read version 0 to compare with the latest.
| |
Expected output (example):
+------+-----+
|status|count|
+------+-----+
| v1|10000|
What to verify
- Version 0 shows
status = v1. - Latest version shows
status = v2. - You can compare row counts across versions.
Notes from practice
- Use time travel for audits, not as a permanent backup strategy.
- If you vacuum aggressively, older versions may disappear.
- Document the version you used when sharing results.
Downloads
If you want to run this without copying code, download the notebook or the .py export.