Build a Spark streaming Data Source
Implement a minimal Data Source API reader with real offsets, a clear schema, and a usable format. You will compare the naive batch approach vs real streaming and run it end-to-end.
Implement a minimal Data Source API reader with real offsets, a clear schema, and a usable format. You will compare the naive batch approach vs real streaming and run it end-to-end.
Detect skewed joins in Spark and apply salting to spread hot keys. You will compare before/after stage and shuffle times, with a synthetic repro and a real dataset plus downloads at the end.
Practical guide with clear examples and expected outputs to master core DataFrame transformations. Includes readable chaining patterns and quick validations.
Learn versionAsOf and timestampAsOf, validate changes, and understand when time travel is best for auditing, recovery, and regression analysis in Delta Lake.
Hands‑on guide to bring up the local stack, check UI/health, and run a first job. Includes minimal checks to confirm Master/Workers are healthy and ready for the rest of the series.
End‑to‑end walkthrough: create a Delta table, insert data, read, filter, and validate results with expected outputs. The minimal base before any optimization work.