Audience and Prerequisites | R and SAS Developer About The Course While R is a popular programming language that is widely deployed for serving various purposes such as statistical analysis, graphic representation and reporting, the Python Developer training covers both basic and advanced level of concepts such as Hadoop streaming, machine learning and MapReduce in Python. This combo training is designed to let students understand the core concepts of R along with mastering popular programming language, Python. Audience & Pre-Requisites While there are no pre-requisites for undergoing this training, basic knowledge of any of the programming language can be helpful. The R + Python Developer training course is specifically designed for Business Intelligence professionals, Software developers, ETL professionals, Software Engineers, Data Analysts, Big Data professionals, SAS Developers seeking to explore open source technology and graduates or professionals wanting to make a career in data science.
|
Curriculum | - R Programming 1. Introduction To R & R-Packages - Understanding R language for statistical programming, introduction to R Studio, features of R, the statistical packages, getting familiar with different data types and functions, learning its deployment in various scenarios, etc.
- Using SQL for applying ‘join’ function, visualization and debugging tools, components of R Studio like code editor and learning about R-bind.
- Learning code compilation, R functions, and data in well-defined format called R-Packages, learn about R-Package structure, Package metadata and testing, CRAN (Comprehensive R Archive Network), Vector creation and variables values assignment.
2. Matrices, Vectors & Sorting Dataframe - R functionality, Rep Function, generating Repeats, Transpose and Stack Function and Sorting and generating Factor Levels.
- Fundamentals of matrix and vector in R, understanding functions like Merge, Strsplit, rowSums, rowMeans, Matrix manipulation, sequencing, repetition, indexing, colMeans, colSums, etc.
3. Generating Plots & Reading Data From External Files - Generating plot in R, Bar Plots, Line Plots, Graphs, Histogram and understanding components of Pie Chart.
- Learning the subscripts in plots in R, obtaining parts of vectors, using subscripts with arrays, logical variables with lists and reading data from external files.
4. Variance Analysis & K-Means Clustering - Understanding the concept of Analysis of Variance (ANOVA) statistical technique, working with Histograms, Pie Charts, deploying ANOVA with R, one way and two way ANOVA.
- K-Means Clustering for Cluster Algorithm, Cluster & Affinity Analysis, cohesive subset of items, working with large datasets, solving clustering issues, association rule mining affinity analysis for the purpose of data mining and analysis in addition to learning co-occurrence relationships.
5. Association Rule Mining & Understanding Relationship With Regression - Understanding Association Rule Mining, the several concepts of Association Rule Mining, different methods to predict relations between variables in large datasets, the algorithm and rules of Association Rule Mining and learning what single cardinality is.
- Introduction to Simple Linear Regression, the different equations of Slope, Line, Y-Intercept Regression Line, the least square criterion, deploying analysis using Regression, standard error to estimate, calculating and analysing the results, and measure of variation.
- Simple Linear Regression analysis, Two variable Relationship, Scatter Plots, Line of best fit, etc.
- In-depth understanding of measure of variation, co-efficient of determination, F-Test, test statistics with an F-distribution, prediction linear regression and advanced regression in R.
- Logistic Regression in R, Logistic Regression Mean, advanced logistic regression, learning how to do prediction using logistic regression, ensuring the model accuracy, understanding sensitivity and specificity, confusion matrix, etc.
- Learning what ROC is, a graphical plot that illustrates binary classifier system, ROC curve in R for the purpose of determination of specificity / sensitivity trade-offs for a binary classifier.
6. ROC & Kolmogorov Smirnov Chart - Detailed learning about ROC, area under ROC Curve, data set partitioning, converting the variable, understanding the process of checking for multi-collinearlity, correlation between two or more variables, advanced data set partitioning, interpretation of the output, predicting the output, detailed confusion matrix, deployment of the Hosmer-Lemeshow test for analysing whether the observed event rates are in accordance with the expected event rates.
- Data analysis with R, learning about the WALD test, the importance of the area under ROC Curve and Kolmogorov Smirnov Chart.
7. R Integration With Hadoop & Database Connectivity With R - Understanding how to create an integrated environment for deployment of R on Hadoop platform, working with R Hadoop, R Hadoop Integrated Programming Environment, R programming for MapReduce job, etc.
- Connecting to different databases from the R environment, deployment of the ODBC tables for reading the data, visualizing the performance of the algorithm by making use of the Confusion Matrix.
8. Project Work & Case Studies - Python Developer 1. Introduction To Python - Basic concepts of Python, understanding its general purpose along with versatility and strengths, benefits of using Python in Read & Write.
2. Python Basics and its Installation - Fundamentals of Python, its objects and data types
- Learning to install Python in different operating systems such as Windows, Linux and Mac
- Understanding Python integrated development environment and its variables.
- Run Python programming
3. Get familiar with Object oriented programming concept - Understanding the concepts of OOP, Built-in as well as user defined Python functions
- Get introduced to the Python Methods, Lists, Loops, Strings, Tuples, program flow of control
- Syntax and documentation in Python
4. Study Advanced Python - Understand different types of exception handling, handling of files in file system, learning to define a class
- Understanding API in Python database and SQLite in Python
5. Study Deployment of Python in Hadoop Environment - Master the concepts and components of Hadoop and its ecosystem.
- Understanding Hadoop Common, HDFS and MapReduce Architecture
- Learning Python scripting for MapReduce in Hadoop framework.
6. Functions & Components Of Python - Learning core concepts of Python, Introduction to the Python dictionary
- Various Python functions such as Lambda,
- Knowing the extensive Python Libraries
7. Machine Learning in Python - Learning core concepts of Python, Introduction to the Python dictionary
- Various Python functions such as Lambda functions, knowing the extensive Python Libraries, etc.
- Machine learning concepts and working of Python in machine learning set-up, supervised and unsupervised machine learning, developing machine learning algorithms, working with data and deploying Python based machine learning techniques in real world cases.
- Learning and managing Sandbox, working with HDFS and Mapping and Reducing functions in Python.
8. Project Work |