Full stack data science road map
The Roadmap is divided into 10 Parts:
- Python Programming
- Data Structure & Algorithms
- Pandas Numpy Matplotlib
- Statistics
- Machine Learning
- Natural Language Processing
- Computer Vision(Deep learning)
- Data Visualization with Tableau
- Structure Query Language (SQL)
- Big Data and PySpark
- Cloud computing
- Some Capstone Projects
Tools and technology you should learn:
- Python
- Data Structures
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-Learn
- Statsmodels
- Natural Language Toolkit ( NLTK )
- PyTorch
- OpenCV
- Tableau
- Structure Query Language ( SQL )
- PySpark
- Cloud Fundamentals
- Any one cloud platform like aws,azure,gcp
1 |.Python Programming
python is best for learning new programing language .compare to other programming language it is very easy one .here some concepts
- Python basics, Variables, Operators, Conditional Statements
- List and Strings
- While Loop, Nested Loops, Loop Else
- For Loop, Break, and Continue statements
- Functions, Return Statement, Recursion
- Dictionary, Tuple, Set
- File Handling, Exception Handling
- Object-Oriented Programming
- Modules and Packages
2 | Data Structure & Algorithms
Data Structure is the most important thing to learn not only for data scientists but for all the people working in computer science. With data structure, you get an internal understanding of the working of everything in software.
- Stacks
- Queues
- Linked List
- Trees
- Graphs
- Sorting
- Searching
- Hashing
3 | Pandas Numpy Matplotlib
Python supports n-dimensional arrays with Numpy. For data in 2-dimensions, Pandas is the best library for analysis. You can use other tools but tools have drag-and-drop features and have limitations. Pandas can be customized as per the need as we can code depending upon the real-life problem.
Numpy:
- Vectors, Matrix
- Operations on Matrix
- Mean, Variance, and Standard Deviation
- Reshaping Arrays
- Transpose and Determinant of Matrix
- Diagonal Operations, Trace
- Add, Subtract, Multiply, Dot, and Cross Product.
Pandas:
- Series and DataFrames
- Slicing, Rows, and Columns
- Operations on DataFrame
- Different ways to create DataFrame
- Read, Write Operations with CSV files
- Handling Missing values, replace values, and Regular Expression
- GroupBy and Concatenation
Matplotlib:
- Graph Basics
- Format Strings in Plots
- Label Parameters, Legend
- Bar Chart, Pie Chart, Histogram, Scatter Plot
- Pie chart
4 | Statistics
Descriptive Statistics:
- Measure of Frequency and Central Tendency
- Measure of Dispersion
- Probability Distribution
- Gaussian Normal Distribution
- Skewness and Kurtosis
- Regression Analysis
- Continuous and Discrete Functions
- Goodness of Fit
- Normality Test
- ANOVA
- Homoscedasticity
- Linear and Non-Linear Relationship with Regression
Inferential Statistics:
- t-Test
- z-Test
- Hypothesis Testing
- Type I and Type II errors
- t-Test and its types
- One way ANOVA
- Two way ANOVA
- Chi-Square Test
- Implementation of continuous and categorical data
5 | Machine Learning
machine lerning types,supervised ,unsupervised,semi-supervised.
- Exploratory Data Analysis(Data preprocessing)
- Linear Regression
- Logistic Regression
- Decision Tree
- Gradient Descent
- Random Forest
- Ridge and Lasso Regression
- Naive Bayes
- Support Vector Machine
- KMeans Clustering
- Other Concepts and Topics for ML
- Measuring Accuracy
- Bias-Variance Trade-off
- Applying Regularization
- Elastic Net Regression
- Predictive Analytics
Other concepts:
- Measuring Accuracy
- Bias-Variance Trade-off
- Applying Regularization
- Elastic Net Regression
- Predictive Analytics
- Time seris model
6 | Natural Language Processing
- Sentiment analysis
- POS Tagging, Parsing,
- Text preprocessing
- Stemming and Lemmatization
- Sentiment classification using Naive Bayes
- TF-IDF, N-gram,
- Machine Translation, BLEU Score
- Text Generation, Summarization, ROUGE Score
- Language Modeling, Perplexity
- Building a text classifier
- Identifying the gender
7 | Computer Vision(Deep Learning)
To work on image and video analytics we can use computer vision. To work on computer vision we have to understand images.
- PyTorch Tensors
- Understanding Pretrained models like AlexNet, ImageNet, ResNet.
- Neural Networks
- Building a perceptron
- Building a single layer neural network
- Building a deep neural network
- Recurrent neural network for sequential data analysis
- Convolutional Neural Networks
- Understanding the ConvNet topology
- Convolution layers
- Pooling layers
- Image Content Analysis
- GAN
- Operating on images using OpenCV-Python
- Detecting edges
- Histogram equalization
- Detecting corners
- Detecting SIFT feature points
8 | Data Visualization with Tableau
- How to use it Visual Perception
- What is it, How it works, Why Tableau
- Connecting to Data
- Building charts
- Calculations
- Dashboards
- Sharing our work
- Advanced Charts, Calculated Fields, Calculated Aggregations
- Conditional Calculation, Parameterized Calculation
9 | Structure Query Language (SQL)
- Setup SQL server
- Basics of SQL
- Writing queries
- Data Types
- Select
- Creating and deleting tables
- Filtering data
- Order
- Aggregations
- Truncate
- Primary Key
- Foreign Key
- Union
- MySQL
10 | BigData and PySpark
- BigData
- What is BigData?
- How is BigData applied within Business?
- PySpark
- Resilient Distributed Datasets
- Schema
- Lambda Expressions
- Transformations
- Actions
- Data Modeling
- Duplicate Data
- Descriptive Analysis on Data
- Visualizations
- ML lib
- ML Packages
- Pipelines
- Streaming
- Packaging Spark Applications
mail id :
barathbaskar33@gmail.com
was this helpful ?
Comments
Post a Comment