Audience and Prerequisites | The Big Data Hadoop Analyst Training course is specifically designed to enable an Analysts Skill to seamlessly work on Big Data and Hadoop that is highly popular for fulfilling the surging demand of the industry to process and analyse data at extremely high speeds. This training course will impart the right set of skills for deploying various tools and techniques to become a Hadoop Analyst working with Big Data. Though not necessary, having basic knowledge of any programming language will be helpful for candidate undertaking this course. Data and System Analysts, Business Intelligence professionals, Project Managers, ETL and Data Warehousing professionals or anyone who wants to learn and make a career in Big Data and Hadoop as Project Managers can enrol for this training course. |
Objectives | Get Ready for Cloudera Data Analyst Certification Exam(CCA 159)
An separate Access to Real-Time Clusters will be provided to everyone for practicing,
Training will be conducted with the help of LMS (Learning Management System), GOTOWebinar Application and all video will be accessed by the participants after the Training through LMS. For retrieving sense from the rising volumes of Big Data, most organizations across the globe are highly relying on Hadoop, owing to which Hadoop is witnessing steady growth. This course will help candidates in understanding how to manage the work on the Hadoop framework and process massive quantities of data streaming at high speeds to unravel valuable insights in real time. There is a huge demand for professionals who can utilize the skills imparted via this training course in real-world Big Data scenarios through COSO IT Real Time Hadoop Clusters. Course Outcomes: - Understanding the Hadoop Ecosystem & Architecture.
- Learning about YARN, Pig and Hive.
- Understanding various complex data processing techniques.
- Learning Hadoop Real Time Querying.
- Deploying MapReduce advanced indexing.
- Getting to know real time analysis on huge datasets.
Students are also ready to take World most Demanding Certification Exam in Big Data from IBM or Cloudera. |
Curriculum | 1. Introduction To Big Data and Hadoop.- What is Big Data, its major factors, introduction to Hadoop.
- Hadoop Ecosystem, Hadoop history, concepts,Distributions high level Architecture.
- Hadoop Myths.
- Hadoop Challenges
- Hardware / Software.
Lab: First Look at Hadoop Real Time Clusters for small Enterprises Data.
2. Introduction to HDFS (Hadoop Distributed File System): - HDFS Overview.
- Learning the concept of HDFS (horizontal scaling, replication, data locality, rack awareness) and its importance.
- Understanding HDFS architecture( Data node, data flow, name node, Secondary Name Node).
- Understanding Data Integrity.
- Future of HDFS : Namenode HA, Federation
Lab: Interacting with HDFS.
3. Map Reduce: - Mapreduce Overview.
- Mapreduce Concepts.
- Daemons : Jobtracker / Tasktracker.
- Phases : driver, mapper, shuffle/sort, reducer.
- Thinking in MapReduce.
- Future of MapReduce (yarn).
Lab: Running a Map Reduce Program.
4. Introduction To Pig. - Pig v/s Java MapReduce.
- Introduction to Pig, its features, use cases and understanding interactions with Pig.
- Data analysis with Pig – simple data types, loading data, Pig latin syntax, field definitions, viewing the Schema, data output, filtering and sorting of data and understanding commonly used functions in Pig.
- Complex data processing with Pig through understanding the complex data types, grouping, iterating grouped data, etc.
- Multi-dataset operations with Pig techniques for combining and joining data sets, splitting data sets, set operations, hands on exercises, etc.
- Advanced concepts.
Lab: Writing Pig Scripts to Analyze/Transform data.
5. Hive:
- Introduction to Hive.
- Introduction to Hive, data storage and Hive Schema, difference between Hive and traditional databases / Hive and Pig, use cases and understanding Hive based interactions.
- Hive Concepts.
- Hive Architecture.
- SQL support in Hive.
- Learning relational data analysis based on Hive through Hive databases and tables, Hive data types, HiveQL syntax, common built in functions, joining data sets, running Hive queries on script, shell and hue.
- Understanding Hive data management with Hive data formats, creation of databases and Hive managed tables, making amendments in databases and tables, loading data into Hive, simplification of queries with views, self-managed tables, storing query results, controlling data access, etc.
- Hive performance optimisation through proper understanding of query performance, bucketing, indexing and portioning data.
- Partitions & Joins in HIVE.
- Text Analytics in HIVE.
Lab(Multiple) : Creating Hive tables and Running queries, joins , using partitions, using text analytics functions.
6. BI Tools for Hadoop:
- BI tools and Hadoop.
- Overview of current BI tools landscape.
- Choosing the best tool for the job.
Access to COSO IT Big Data Labs(Real-Time Clusters) for project and practicing. |