Hadoop Developer Course Content
Introduction to Bigdata and Hadoop
  • What is Bigdata
  • Why Bigdata is needed
  • Bigdata characteristics
  • How to store and process Bigdata
  • What is Hadoop
  • Why Hadoop
  • Hadoop history
  • Software and hardware requirements for Hadoop
  • Hadoop real time use case
  • Major Components of Hadoop
  • Hadoop ecosystem projects
  • Scope of Hadoop
  • Hadoop distributions
  • Hadoop installation modes
Hadoop Developer Course Content
  • Introduction to HDFS
  • Why HDFS
  • HDFS Commands
  • Regular File system Vs Hadoop distributed file system
  • HDFS Master/slave architecture
  • Daemons in Hadoop
  • HDFS concepts like blocks, Name node, Secondary name node, data node
  • HDFS File reads
  • HDFS File writes
  • Fault Tolerance
  • Details on Network topologies like nodes, clusters and racks
  • Details aboutheartbeat?
  • Details on rackawareness file
  • HDFS Federation
  • High Availability of Namenode
  • Hadoop Archive files
  • Distcp usage
  • Assignment and Interview Questions on HDFS module
MapReduce
  • Introduction to MapReduce framework
  • MapReduce Architecture
  • MapReduce execution phases
  • Details on input splits, mappers, shuffle sort and reducers
  • Eclipse plug in installation
  • My first map reduce program
  • Depth knowledge about Combiners
  • Details on Tool runner
  • Partitioner
  • Realtime usecases to write MapReduce programs
Advanced MapReduce
  • Counters
  • Real time use case on Counters
  • Secondary Sorting
  • Map side joins
  • Reducer side joins
  • Classic MapReduce and Yarn
  • Details on resource Manager, Application Master, Node Manager and Container
  • Performance Tuning features
  • Hadoop Streaming
  • Hadoop Pipes
  • File Input /Output Formats in MapReduce
  • Distributed cache
  • Assignment and Interview Questions on MapReduce and advance MapReduce
Hive
  • Introduction to Hive
  • Hive Architecture
  • Difference between HQL and SQL
  • Installation of Hive
  • Depth knowledge on Managed Tables and External Tables
  • Hive Data types
  • Hive Create, Alter and drop tables
  • Hive Multi table inserts
  • Partitions in Hive with real time example
  • Bucketing in Hive with real time example
  • Hive storage formats
  • Joins in hive
  • Hive Indexes
  • Hive Views
  • Hive UDF
  • Assignment and Interview Questions on Hive
Pig
  • Introduction to Pig
  • Details on pig data flow engine
  • MapReduce Vs Hive Vs Pig
  • When to use Pig
  • Datatypes in Pig
  • Modes of execution in Pig
  • Pig programming
  • Pig Execution models
  • Operators in Pig
  • Pig UDF
  • Assignment and Interview Questions on PIG
HBASE
  • Introduction to HBASE
  • Basic Configurations of HBASE
  • Fundamentals of HBase
  • HBase Data Model
  • HBASE Architecture
  • SQL vs. NOSQL
  • HDFS vs. HBase
  • Client-side buffering or bulk uploads
  • HBase Operations
  • Assignment and Interview Questions on HBase
Sqoop
  • Introduction to Sqoop
  • Sqoop and sqoop2 architectural differences
  • Sqoop Import
  • Sqoop Incremental Import
  • Sqoop Import-all
  • Sqoop Export
  • Sqoop Jobs
  • Real time Example of Import/export from RDBMS (MySQL) to Hadoop
Flume
  • Introduction to Flume
  • Architecture of Flume
  • Depth on Flumeagents
  • Real time Data ingestion from Twitter to Hadoop using flume
  • Assignment and Interview Questions
Hadoop Admin Course Content
Introduction to Bigdata and Hadoop
  • What is Bigdata
  • Why Bigdata is needed
  • Bigdata characteristics
  • How to store and process Bigdata
  • What is Hadoop
  • Why Hadoop
  • Hadoop history
  • Software and hardware requirements for Hadoop
  • Hadoop real time use case
  • Major Components of Hadoop
  • Hadoop ecosystem projects
  • Scope of Hadoop
  • Hadoop distributions
Planning Your Hadoop Cluster
  • Hadoop Installation Modes
  • Hadoop Releases
  • Virtual machine set up
  • Installing latest Cloudera Quick start VM
  • Hadoop installation Pseudo Distributed Mode Cluster set up
  • Hadoop Cluster Architecture
  • Hadoop cluster planning
  • Sizing the cluster
  • In depth Details on configuration files
Multi node Cluster Setup and Maintenance
  • Installing and configuring multi node cluster setup
  • Adding and Removing Cluster Nodes
  • Rebalancing the cluster
  • Name Node Metadata Backup
  • Decommissioning the nodes
  • Cluster Upgrading
Hadoop Distributed File System (HDFS)
  • Introduction to HDFS
  • Why HDFS
  • HDFS Commands
  • Regular File system Vs Hadoop distributed file system
  • HDFS Master/slave architecture
  • Daemons in Hadoop
  • HDFS concepts like blocks, Name node, Secondary name node, data node
  • HDFS File reads
  • HDFS File writes
  • Fault Tolerance
  • Details on Network topologies like nodes, clusters and racks.
  • Details aboutheartbeat?
  • Details on rackawareness file
  • HDFS Federation
  • High Availability of Namenode
  • Hadoop Archive files
  • Distcp usage
  • HDFS Admin Commands
  • Exercise
Over View of MapReduce 2.0
  • Introduction to MapReduce framework
  • MapReduce Architecture
  • MapReduce execution phases
  • Details on input splits, mappers, shuffle sort and reducers
  • Classic MapReduce and Yarn
  • Details on resource Manager, Application Master, Node Manager and Container
Cluster Administration using Cloudera Manger
  • Cloudera Manager features
  • Configuration management
  • Resource management
  • Reports in Cloudera Manager
  • Alerts in Cloudera manager
  • Service management
Installation and managing Hadoop Ecosystem
  • Understanding Hive
  • Installing and configuring Hive
  • Understanding PIG
  • Installing and configuring PIG
  • Understanding SQOOP
  • Installing and configuring SQOOP
  • Understanding FLUME
  • Installing and configuring FLUME
Advance Cluster Setups
  • High Availability setup for Hadoop Clusters
  • Setting up the Hadoop Environment in Amazon cloud EC2
Cluster Monitoring, Troubleshooting, and Optimizing
  • Name Node and Job Tracker Web UI
  • View and Manage Hadoop’s Log files
  • GangliaMonitoring Tool
  • Nagios monitoring Tool
  • Common cluster issues and their resolutions
  • Optimization Techniques for the cluster