Back to articles
Batch Processing with Apache Spark
How-ToDevOps

Batch Processing with Apache Spark

via Dev.toRyan Giggs

Week 6 of Data Engineering Zoomcamp by @DataTalksClub complete Just finished Module 6 - Batch Processing with Spark. Learned how to: ✅ Set up PySpark and create Spark sessions ✅ Read and process Parquet files at scale ✅ Repartition data for optimal performance ✅ Analyze millions of taxi trips with DataFrames ✅ Use Spark UI for monitoring jobs Processing 4M+ taxi trips with Spark - distributed computing is powerful Here's my homework solution: https://github.com/Derrick-Ryan-Giggs/pyspark-homework Following along with this amazing free course - who else is learning data engineering? You can sign up here: https://github.com/DataTalksClub/data-engineering-zoomcamp/

Continue reading on Dev.to

Opens in a new tab

Read Full Article
10 views

Related Articles