Build a Spark streaming Data Source
Implement a minimal Data Source API reader with real offsets, a clear schema, and a usable format. You will compare the naive batch approach vs real streaming and run it end-to-end.
Implement a minimal Data Source API reader with real offsets, a clear schema, and a usable format. You will compare the naive batch approach vs real streaming and run it end-to-end.
Explains offsets, partitions, and rebalances with a runnable example that shows how consumption is split across consumers and what happens when scaling or failures occur.
Kafka CLI first steps: create topics, produce events, and consume them from console in a reproducible local environment. Perfect for practice without cloud dependencies.
Connect local Kafka to Spark Structured Streaming, define a schema, and run a continuous read. Includes simple metrics and validations to confirm the stream is working.