Course Overview
A framework which allows distributed processing of large data sets across a cluster of computers using simple programming models is called Hadoop.
A general purpose and fast cluster computing or framework is called Spark whereas Scala is a programming language in which Spark is written
- During this training, participants would
- Learn about Hadoop Traditional Models
- Understand HDFS Architecture
- Understand MapReduce
- Learn about Impala and Hive
- Understand RDD lineage
- Understand PIG
Requirements
- Familiarity with Java
- Intermediate level of exposure in Data Analytics
Curriculum
-
Lesson 1
-
Lesson 2
-
Lesson 3
-
Lesson 4
-
Lesson 5
-
Lesson 6
-
Lesson 7
-
Lesson 8
- Overview of Sqoop
- Basic: Imports & Exports
- Performance improving Sqoop
- Limitations: Sqoop
- Understanding Sqoop 2
- Understanding Apache Flume
- Basic: Flume Architecture
- Understanding Flume-Sources
- Understanding Flume-Sinks
- Understanding Flume-Channels
- Configuration of Flume
- Understanding HBase
- Architecture HBase
- Data storage: HBase
- Comparing HBase & RDBMS
- Using HBase
-
Lesson 9
-
Lesson 10
-
Lesson 11
-
Lesson 12
-
Lesson 13
-
Lesson 14
-
Lesson 15
-
Lesson 16