Spark Training

2 day
42 Lessons
0 Enrolled
(0 Ratings)

Course Overview

The Spark training provides students with a solid technical introduction to the Spark architecture and how Spark works. Participants learn the basic building blocks of Spark, including RDDs and the distributed compute engine, as well as higher-level concepts that provide a simpler and more capable interface, including Spark SQL and DataFrames.

This course covers more advanced skills such as the use of Spark Streaming to process streaming data, and provides an overview of Spark Graph Processing – GraphX and GraphFrames and Spark Machine Learning- SparkML Pipelines. Lastly, the participants explore possible performance issues, troubleshooting, cluster deployment techniques, and strategies for optimization

All students will:

  • Understand the need for Spark in data processing and Understand the Spark architecture as to how it distributes computations to cluster nodes
  • Be familiar with basic installation, setup, layout of Spark
  • Use the Spark for interactive and ad-hoc operations
  • Use Dataset, DataFrame, Spark SQL to efficiently process structured data
  • Understand basics of RDDs (Resilient Distributed Datasets), data partitioning, pipelining, and computations
  • Understand Spark’s data caching and its usage
  • Understand performance implications and optimizations when using Spark
  • Participants will be familiar with Spark Graph Processing and SparkML machine learning


  • Fundamental knowledge of any programming language and Basic understanding of any database, SQL, and query language for databases
  • Participants/Attendees must have working knowledge of Linux- or Unix-based systems however this is not mandatory.
User Avatar


2 Reviews
51 Students
127 Courses
0 rating
5 stars
4 stars
3 stars
2 stars
1 stars

Be the first to review “Spark Training”

Main Content