Hadoop

(0 review)

This course covers the necessary certification topics for Hadoop. It includes ETL processes, Administration, Modelling, Query Designing and Reporting.
As part of the curriculum, you will also be introduced to a configuration which covers Functional Programming, Pattern, Higher Order Function, OOPs, Data Frame, Data Set, Streaming, Data Modelling, Data Warehouse Workbench, Query Designing and Reporting.

Spectators/Attendees:

  • Anyone with basic ETL, Database and Data Warehouse knowledge
  • Query/Report designers
  • Project members
  • Data Warehouse Consultants
  • Wannabe BIG Data Consultants
  • End Users

Pre-requisite – Knowledge in Core Java – topics:

  • OOPS concepts – Class, Object, Encapsulation, Abstraction, Polymorphism, Inheritance.
  • Exceptions
  • Collections
  • Thread
  • Serialization
  • File handling – byte stream.

Module: 1. Introduction to Functional programming

Goal set: By end of this session you will learn about functional programming principles in Scala.
Topics: Functional and Object Programming, Mutability and Immutability, recursive function, Val and Var, Data types and Collections, Functions, Procedures, Nested Functions, Variable Argument parameter.
Module: 2. Pattern matching and Implicit

Goal set: By end of this session you will learn how to use pattern matching and Implicits.
Topics: Match Expression, Value Binding, Pattern Guard, Default case, Currying, Type parameter, Function Objects, Function literal, Anonymous function, Partially applied functions, Implicits.
Module: 3. Higher order functions

Goal set: By end of this session you will learn how to use Scala functions to do transformations and aggregation.
Topics: foreachHigher order functionsfor each, filter, reduce, partition, mapPartition, sortBy, fold function.Tools – Intellij IDEA and Sbt/Maven.

Module: 4.Object-Oriented programming and Traits

Goal set: By end of this session you will learn how to create Scala object
Topics: Class, Objects, constructor, getter and setter, Case class, Monadic collection option, Lazy, Traits.

Hands-on exercise on Scala.

Module: 1. Spark transformations and actions
Goal set: By end of this session you will learn how to use Spark shell for interactive data analysis, features of RDD.
Topics: Overview of Spark, parallize API, RDD(Resilient Distributed Dataset), Common transformations and actions, DAG(Direct Acyclic Graph), transformations and actions on Key/Value pair RDDs.

Module: 2. Spark Architecture and advanced operations
goal set: By end of this session you will learn to run Spark Application and how Spark parallelizes task execution.
Topics: Spark Architecture in YARN, Create spark context, build and run the Spark Application, RDD Lineage, Cache, RDD persistence, Joins, ShuffleBased joins, Broadcast joins, Repartition, logging.

Module: 3. Spark Dataframe, Datasets and SQL
Goal set: By end of this session you will learn how to write Spark Batch Application and schedule in Oozie.
Topics: Introduction to SparkSQL, Dataframe, Datasets, read API, case class, struct type, toDF, Data frames API, Oozie.

Module: 4. Spark Streaming
Goal set: By end of this session you will learn how to write Spark Streaming Application
Topics: Streaming context, Dstream, Receiver, Transformations, Window operations,

Module: 5. Spark Streaming and improve Spark performance
Goal set: By end of this session you will learn how to diagnose and improve spark application performance
Topics: Application performance improvement

Project:
Project I - Spark Batch to read data from the dataset, transform and save data in Hive.
Project II - Spark Streaming using Kafka, transform and save data in Hive.

Course Features

  • Lectures 1
  • Quizzes 0
  • Duration 50 hours
  • Skill level All levels
  • Language English