Miniatura de mi post

🛠️ Basic Environment Check (Jupyter + Spark Local)

Quick steps Start a local Spark session Print the Spark version Run a simple row count Write & read a small Parquet dataset under ./data Tip: Keep this notebook as your first-run check for any lab session 1) Paths and data folder We resolve path for data/ folder, all files written here persist on your host from pathlib import Path base_dir = Path.cwd().parent data_dir = base_dir / "data" / "00_env_check" print("Project base folder:", base_dir) print("Project data folder:", data_dir) Project base folder: /home/jovyan/work Project data folder: /home/jovyan/work/data/00_env_check 2) Spark session and version Creates a local Spark session, the Spark UI should use port 4040 ...

October 6, 2025