Pandas DataFrames mutate inside functions

See why pandas DataFrames are mutable, how in-place ops leak changes across function boundaries, and how to make intent explicit. Includes a runnable repro, expected output, and safer patterns.

February 4, 2026 · 2 min · 301 words · pw

Kafka consumer groups, explained

Explains offsets, partitions, and rebalances with a runnable example that shows how consumption is split across consumers and what happens when scaling or failures occur.

February 1, 2026 · 1 min · 194 words · pw

Local Kafka with CLI, your first run

Kafka CLI first steps: create topics, produce events, and consume them from console in a reproducible local environment. Perfect for practice without cloud dependencies.

February 1, 2026 · 1 min · 207 words · pw

PySpark basics for everyday work

Practical guide with clear examples and expected outputs to master core DataFrame transformations. Includes readable chaining patterns and quick validations.

February 1, 2026 · 2 min · 343 words · pw

Query past versions in Delta

Learn versionAsOf and timestampAsOf, validate changes, and understand when time travel is best for auditing, recovery, and regression analysis in Delta Lake.

February 1, 2026 · 2 min · 308 words · pw

Spark local, first run

Hands‑on guide to bring up the local stack, check UI/health, and run a first job. Includes minimal checks to confirm Master/Workers are healthy and ready for the rest of the series.

February 1, 2026 · 1 min · 211 words · pw

Spark partitions without the pain

Introduce spark.sql.shuffle.partitions, repartition, and coalesce with a reproducible example to see impact on stages, time, and shuffle size.

February 1, 2026 · 2 min · 252 words · pw
Delta storage layout: what's really on disk

What Delta stores on disk

Explore the on‑disk layout, commits, and checkpoints, and see why it matters for performance, maintenance, and troubleshooting in production.

February 1, 2026 · 2 min · 295 words · pw