Spark Data Source API, step by step (CoinGecko)

February 1, 2026 · 3 min · 459 words · pw
Spark Data Source API streaming con CoinGecko

Step‑by‑step guide: from a batch approach to a real streaming Data Source in PySpark 3.5.1. Defines schema, offsets, reader, and provider registration to use format("coingecko").

What Delta stores on disk

February 1, 2026 · 2 min · 273 words · pw
Delta storage layout: what's really on disk

Explore the on‑disk layout, commits, and checkpoints, and see why it matters for performance, maintenance, and troubleshooting in production.