Build a Spark streaming Data Source

Implement a minimal Data Source API reader with real offsets, a clear schema, and a usable format. You will compare the naive batch approach vs real streaming and run it end-to-end.

February 1, 2026 · 3 min · 441 words · pw

Fix skewed Spark joins

Detect skewed joins in Spark and apply salting to spread hot keys. You will compare before/after stage and shuffle times, with a synthetic repro and a real dataset plus downloads at the end.

February 1, 2026 · 4 min · 725 words · pw

PySpark basics for everyday work

Practical guide with clear examples and expected outputs to master core DataFrame transformations. Includes readable chaining patterns and quick validations.

February 1, 2026 · 2 min · 343 words · pw

Query past versions in Delta

Learn versionAsOf and timestampAsOf, validate changes, and understand when time travel is best for auditing, recovery, and regression analysis in Delta Lake.

February 1, 2026 · 2 min · 308 words · pw

Spark local, first run

Hands‑on guide to bring up the local stack, check UI/health, and run a first job. Includes minimal checks to confirm Master/Workers are healthy and ready for the rest of the series.

February 1, 2026 · 1 min · 211 words · pw

Your first Delta table, step by step

End‑to‑end walkthrough: create a Delta table, insert data, read, filter, and validate results with expected outputs. The minimal base before any optimization work.

February 1, 2026 · 2 min · 321 words · pw