Pandas DataFrames mutate inside functions

See why pandas DataFrames are mutable, how in-place ops leak changes across function boundaries, and how to make intent explicit. Includes a runnable repro, expected output, and safer patterns.

February 4, 2026 · 2 min · 301 words · pw

Fix skewed Spark joins

Detect skewed joins in Spark and apply salting to spread hot keys. You will compare before/after stage and shuffle times, with a synthetic repro and a real dataset plus downloads at the end.

February 1, 2026 · 4 min · 725 words · pw

Kafka consumer groups, explained

Explains offsets, partitions, and rebalances with a runnable example that shows how consumption is split across consumers and what happens when scaling or failures occur.

February 1, 2026 · 1 min · 194 words · pw

Local Kafka with CLI, your first run

Kafka CLI first steps: create topics, produce events, and consume them from console in a reproducible local environment. Perfect for practice without cloud dependencies.

February 1, 2026 · 1 min · 207 words · pw

PySpark basics for everyday work

Practical guide with clear examples and expected outputs to master core DataFrame transformations. Includes readable chaining patterns and quick validations.

February 1, 2026 · 2 min · 343 words · pw

Read Kafka with Spark Streaming

Connect local Kafka to Spark Structured Streaming, define a schema, and run a continuous read. Includes simple metrics and validations to confirm the stream is working.

February 1, 2026 · 1 min · 210 words · pw

Spark local, first run

Hands‑on guide to bring up the local stack, check UI/health, and run a first job. Includes minimal checks to confirm Master/Workers are healthy and ready for the rest of the series.

February 1, 2026 · 1 min · 211 words · pw

Spark partitions without the pain

Introduce spark.sql.shuffle.partitions, repartition, and coalesce with a reproducible example to see impact on stages, time, and shuffle size.

February 1, 2026 · 2 min · 252 words · pw
Delta storage layout: what's really on disk

What Delta stores on disk

Explore the on‑disk layout, commits, and checkpoints, and see why it matters for performance, maintenance, and troubleshooting in production.

February 1, 2026 · 2 min · 295 words · pw

Your first Delta table, step by step

End‑to‑end walkthrough: create a Delta table, insert data, read, filter, and validate results with expected outputs. The minimal base before any optimization work.

February 1, 2026 · 2 min · 321 words · pw