Data Engineering — Apache Spark & Kafka
Learn Apache Spark, Kafka, Delta Lake, and Airflow to build real-time data pipelines
Gain hands-on experience in data engineering from ingestion to analytics-ready output
What you'll Learn
Process large datasets in parallel with PySpark DataFrames and Spark SQL.
Build streaming data pipelines with Spark Structured Streaming and Kafka.
Design and implement a Medallion Architecture (Bronze/Silver/Gold) data lake.
Use Delta Lake for ACID transactions, schema evolution, and time travel.
Produce and consume real-time events with Apache Kafka and Kafka Streams.
Orchestrate multi-step data pipelines with Apache Airflow DAGs.
Integrate the pipeline with AWS S3, Glue Catalog, and Athena.
Tune Spark jobs for performance: partitioning, caching, broadcast joins.
Skills You Gain
PySpark
Spark Streaming
Kafka Streams
Data Lake Architecture
Schema Evolution
Spark SQL
Apache Kafka
Delta Lake
AWS S3 + Glue
Performance Tuning
DataFrames & RDDs
Kafka Producers & Consumers
Apache Airflow
Medallion Architecture
dbt Basics