Full stack data science road map

The‌ ‌Roadmap‌ ‌is‌ ‌divided‌ ‌into‌ 10 Parts: 

  1. Python‌ ‌Programming‌ ‌
  2. Data‌ ‌Structure‌ ‌&‌ ‌Algorithms‌
  3. Pandas‌ ‌Numpy‌ ‌Matplotlib‌
  4. Statistics‌
  5. Machine‌ ‌Learning‌
  6. Natural‌ ‌Language‌ ‌Processing‌
  7. Computer‌ ‌Vision‌‌(Deep learning)
  8. Data‌ ‌Visualization‌ ‌with‌ ‌Tableau‌
  9. Structure‌ ‌Query‌ ‌Language‌ ‌(SQL)‌
  10. Big‌ ‌Data‌ ‌and‌ ‌PySpark‌
  11. Cloud computing
  12. Some Capstone Projects  

Tools and technology you should learn:

  • Python‌
  • Data‌ ‌Structures‌
  • NumPy‌
  • Pandas‌
  • Matplotlib‌
  • Seaborn‌
  • Scikit-Learn‌
  • Statsmodels‌
  • Natural‌ ‌Language‌ ‌Toolkit‌ ‌(‌ ‌NLTK‌ ‌)‌
  • PyTorch‌
  • OpenCV‌
  • Tableau‌
  • Structure‌ ‌Query‌ ‌Language‌ ‌(‌ ‌SQL‌ ‌)‌
  • PySpark‌
  • Cloud ‌Fundamentals‌
  • Any one cloud platform like aws,azure,gcp

1 |.Python Programming 

python is best for learning new programing language .compare to other programming language it is very easy one .here some concepts 

  • Python basics, Variables, Operators, Conditional Statements
  • List and Strings
  • While Loop, Nested Loops, Loop Else
  • For Loop, Break, and Continue statements
  • Functions, Return Statement, Recursion
  • Dictionary, Tuple, Set
  • File Handling, Exception Handling
  • Object-Oriented Programming
  • Modules and Packages

2 | Data Structure & Algorithms

Data Structure is the most important thing to learn not only for data scientists but for all the people working in computer science. With data structure, you get an internal understanding of the working of everything in software.
  • Stacks
  • Queues
  • Linked List
  • Trees
  • Graphs
  • Sorting
  • Searching
  • Hashing

3 | Pandas Numpy Matplotlib

Python supports n-dimensional arrays with Numpy. For data in 2-dimensions, Pandas is the best library for analysis. You can use other tools but tools have drag-and-drop features and have limitations. Pandas can be customized as per the need as we can code depending upon the real-life problem.

Numpy:

  • Vectors, Matrix
  • Operations on Matrix
  • Mean, Variance, and Standard Deviation
  • Reshaping Arrays
  • Transpose and Determinant of Matrix
  • Diagonal Operations, Trace
  • Add, Subtract, Multiply, Dot, and Cross Product.

    Pandas:

    • Series and DataFrames
    • Slicing, Rows, and Columns
    • Operations on DataFrame
    • Different ways to create DataFrame
    • Read, Write Operations with CSV files
    • Handling Missing values, replace values, and Regular Expression
    • GroupBy and Concatenation 

    Matplotlib:

    • Graph Basics
    • Format Strings in Plots
    • Label Parameters, Legend
    • Bar Chart, Pie Chart, Histogram, Scatter Plot
    • Pie chart 

    4 | Statistics

    Descriptive Statistics:

    • Measure of Frequency and Central Tendency
    • Measure of Dispersion
    • Probability Distribution
    • Gaussian Normal Distribution
    • Skewness and Kurtosis
    • Regression Analysis
    • Continuous and Discrete Functions
    • Goodness of Fit
    • Normality Test
    • ANOVA
    • Homoscedasticity
    • Linear and Non-Linear Relationship with Regression

    Inferential Statistics:

    • t-Test
    • z-Test
    • Hypothesis Testing
    • Type I and Type II errors
    • t-Test and its types
    • One way ANOVA
    • Two way ANOVA
    • Chi-Square Test
    • Implementation of continuous and categorical data

    5 | Machine Learning

    machine lerning types,supervised ,unsupervised,semi-supervised.
    • Exploratory Data Analysis(Data preprocessing)
    • Linear Regression
    • Logistic Regression
    • Decision Tree
    • Gradient Descent
    • Random Forest
    • Ridge and Lasso Regression
    • Naive Bayes
    • Support Vector Machine
    • KMeans Clustering
    • Other Concepts and Topics for ML
    • Measuring Accuracy
    • Bias-Variance Trade-off
    • Applying Regularization
    • Elastic Net Regression
    • Predictive Analytics

    Other concepts:

    • Measuring Accuracy
    • Bias-Variance Trade-off
    • Applying Regularization
    • Elastic Net Regression
    • Predictive Analytics
    • Time seris model

    6 | Natural Language Processing

    • Sentiment analysis
    • POS Tagging, Parsing,
    • Text preprocessing
    • Stemming and Lemmatization
    • Sentiment classification using Naive Bayes
    • TF-IDF, N-gram,
    • Machine Translation, BLEU Score
    • Text Generation, Summarization, ROUGE Score
    • Language Modeling, Perplexity
    • Building a text classifier
    • Identifying the gender

    7 | Computer Vision(Deep Learning)

    To work on image and video analytics we can use computer vision. To work on computer vision we have to understand images.
    • PyTorch Tensors
    • Understanding Pretrained models like AlexNet, ImageNet, ResNet.
    • Neural Networks
    • Building a perceptron
    • Building a single layer neural network
    • Building a deep neural network
    • Recurrent neural network for sequential data analysis
    • Convolutional Neural Networks
    • Understanding the ConvNet topology
    • Convolution layers
    • Pooling layers
    • Image Content Analysis
    • GAN
    • Operating on images using OpenCV-Python
    • Detecting edges
    • Histogram equalization
    • Detecting corners
    • Detecting SIFT feature points

    8 | Data Visualization with Tableau

    • How to use it Visual Perception
    • What is it, How it works, Why Tableau
    • Connecting to Data
    • Building charts
    • Calculations
    • Dashboards
    • Sharing our work
    • Advanced Charts, Calculated Fields, Calculated Aggregations
    • Conditional Calculation, Parameterized Calculation

    9 | Structure Query Language (SQL)

    • Setup SQL server
    • Basics of SQL
    • Writing queries
    • Data Types
    • Select
    • Creating and deleting tables
    • Filtering data
    • Order
    • Aggregations
    • Truncate
    • Primary Key
    • Foreign Key
    • Union
    • MySQL

    10 | BigData and PySpark

    • BigData

    • What is BigData?
    • How is BigData applied within Business?

    • PySpark

    • Resilient Distributed Datasets
    • Schema
    • Lambda Expressions
    • Transformations
    • Actions

    • Data Modeling

    • Duplicate Data
    • Descriptive Analysis on Data
    • Visualizations
    • ML lib
    • ML Packages
    • Pipelines

    • Streaming

    • Packaging Spark Applications

    mail id :

    barathbaskar33@gmail.com
     
    was this helpful ?

    Comments

    Popular posts from this blog

    weight initialization techniques in nural network

    why data science projects gets faild ?