This course covers the necessary certification topics for Hadoop. It includes ETL processes, Administration, Modelling, Query Designing and Reporting.
As part of the curriculum, you will also be introduced to a configuration which covers Functional Programming, Pattern, Higher Order Function, OOPs, Data Frame, Data Set, Streaming, Data Modelling, Data Warehouse Workbench, Query Designing and Reporting.
- Anyone with basic ETL, Database and Data Warehouse knowledge
- Query/Report designers
- Project members
- Data Warehouse Consultants
- Wannabe BIG Data Consultants
- End Users
Pre-requisite – Knowledge in Core Java – topics:
- OOPS concepts – Class, Object, Encapsulation, Abstraction, Polymorphism, Inheritance.
- File handling – byte stream.
Module: 1. Introduction to Functional programming
Goal set: By end of this session you will learn about functional programming principles in Scala.
Topics: Functional and Object Programming, Mutability and Immutability, recursive function, Val and Var, Data types and Collections, Functions, Procedures, Nested Functions, Variable Argument parameter.
Module: 2. Pattern matching and Implicit
Goal set: By end of this session you will learn how to use pattern matching and Implicits.
Topics: Match Expression, Value Binding, Pattern Guard, Default case, Currying, Type parameter, Function Objects, Function literal, Anonymous function, Partially applied functions, Implicits.
Module: 3. Higher order functions
Goal set: By end of this session you will learn how to use Scala functions to do transformations and aggregation.
Topics: foreachHigher order functionsfor each, filter, reduce, partition, mapPartition, sortBy, fold function.Tools – Intellij IDEA and Sbt/Maven.
Module: 4.Object-Oriented programming and Traits
Goal set: By end of this session you will learn how to create Scala object
Topics: Class, Objects, constructor, getter and setter, Case class, Monadic collection option, Lazy, Traits.
Hands-on exercise on Scala.Module: 1. Spark transformations and actions
Goal set: By end of this session you will learn how to use Spark shell for interactive data analysis, features of RDD.
Topics: Overview of Spark, parallize API, RDD(Resilient Distributed Dataset), Common transformations and actions, DAG(Direct Acyclic Graph), transformations and actions on Key/Value pair RDDs.
Module: 2. Spark Architecture and advanced operations
goal set: By end of this session you will learn to run Spark Application and how Spark parallelizes task execution.
Topics: Spark Architecture in YARN, Create spark context, build and run the Spark Application, RDD Lineage, Cache, RDD persistence, Joins, ShuffleBased joins, Broadcast joins, Repartition, logging.
Module: 3. Spark Dataframe, Datasets and SQL
Goal set: By end of this session you will learn how to write Spark Batch Application and schedule in Oozie.
Topics: Introduction to SparkSQL, Dataframe, Datasets, read API, case class, struct type, toDF, Data frames API, Oozie.
Module: 4. Spark Streaming
Goal set: By end of this session you will learn how to write Spark Streaming Application
Topics: Streaming context, Dstream, Receiver, Transformations, Window operations,
Module: 5. Spark Streaming and improve Spark performance
Goal set: By end of this session you will learn how to diagnose and improve spark application performance
Topics: Application performance improvement
Project I - Spark Batch to read data from the dataset, transform and save data in Hive.
Project II - Spark Streaming using Kafka, transform and save data in Hive.
- Lectures 1
- Quizzes 0
- Duration 50 hours
- Skill level All levels
- Language English