Exploratory Data Analysis (EDA) involves multiple techniques that extract meaningful information and valuable insights from the data. The main purpose of EDA is to investigate datasets to reveal the underlying structures and the challenges and opportunities that come with data without attempting to make predictions using machine learning models. EDA is the first attempt in data mining and the first exercise to make further decisions. It is a critical phase in any data analysis or data science project by generating summary statistics in a dataset as well as constructing various graphical representations to help understand the hidden knowledge of data and answer the businesses' questions of why and how.
TechClass Exploratory Data Analysis with Python online course will introduce you to the practical knowledge and the main pillars of EDA, including data exploration, data preparation, visualization, data relationships, and clustering using Python programming language. Apart from the intuitions, you will get familiar with how EDA steps are implemented by various Python libraries such as NumPy, Pandas, Matplotlib, and Scikit-learn. By the end of this course, you will be prepared to enter the fantastic world of data analysis towards amazing job opportunities in the industry.
Learning outcomes
- Learn the general framework of EDA and why it is important
- Get hands-on experience with NumPy and Pandas Library
- Learn the general concepts of descriptive statistics and how to extract them using the Pandas library
- Learn how to plot various visualizations to extract meaningful insights from data using Matplotlib and Seaborn libraries
- Get familiar with standard practices of data preparation
- Learn how to treat missing values and outlier detection,
- Learn how to perform feature engineering
- Get familiar with the general framework of data relationships and learn the intuitions behind correlation analysis
- Learn about feature scaling and how to implement it in Python
- Learn about feature encoding and how to implement it in Python
- Learn how to group by dataset to extract insight from data
- Learn how to perform dimensionality reduction to represent and visualize data in lower-dimensional space
- Learn how to identify group patterns and perform clustering using the k-Means method
Table of contents
Chapter 1: Intro to Course
- 1.1. Welcome!
- 1.2. About TechClass Data Science Department
- 1.3. Learning Outcomes
- 1.4. Your Expectations, Goals, and Knowledge
- 1.5. Abbreviations
- 1.6. Copyright Notice
Chapter 2: Introduction
- 2.1. Introduction to Data Science
- 2.2. Data Science Workflow
- 2.3. Data
- 2.4. Sources of Data
- 2.5. What Is Exploratory Data Analysis?
- 2.6. Python Libraries for EDA
Chapter 3: Describing Data
- 3.1. Introduction
- 3.2. Observations and Variables
- 3.3. Categorical Variables
- 3.4. Quantitative Variables
- 3.5. Central Tendency
- 3.6. Data Variability
- 3.7. Distribution Functions
Chapter 4: Importing Data
- 4.1. Introduction
- 4.2. Vectors and Matrices
- 4.3. NumPy Arrays
- 4.4. Working with NumPy Arrays
- 4.5. Loading Data with NumPy
- 4.6. Pandas Series
- 4.7. Working with Series
- 4.8. Pandas DataFrame
- 4.9. Working with DataFrames
- 4.10. Exercise
Chapter 5: Data Exploration
- 5.1. Extracting Descriptive Statistics
- 5.2. Extracting Descriptive Statistics: Preliminaries
- 5.3. Extracting Descriptive Statistics: Implementation
- 5.4. Mathematical Operations on DataFrame
- 5.5. Applying Functions to DataFrame
- 5.6. Querying a DataFrame
- 5.7. Filtering Data
- 5.8. Groupby
- 5.9. Cross Tabulation
Chapter 6: Data Visualization
- 6.1. Univariate Analysis
- 6.2. Histogram
- 6.3. Frequency Polygons
- 6.4. Boxplot
- 6.5. Bar Chart
- 6.6. Pie Chart
- 6.7. Bivariate Analysis
- 6.8. Scatter Plot
- 6.9. Hexbins
- 6.10. Stacked Column Chart
Chapter 7: Data Preparation
- 7.1. Introduction
- 7.2. Incorrect Values and Categories
- 7.3. Feature Engineering: Creating New Features
- 7.4. Outlier Detection: Univariant
- 7.5. Outlier Detection: Multivariant
- 7.6. Removing Missing Values
- 7.7. Imputing Missing Values: Mean/Mode Imputation
- 7.8. Imputing Missing Values: K-NN Imputation
- 7.9. Feature Encoding: Label Encoding
- 7.10. Feature Encoding: One-Hot Encoding
- 7.11. Feature Scaling: Normalization
- 7.12. Feature Scaling: Standardization
Chapter 8: Data Relationships
- 8.1. Introduction
- 8.2. Covariance Matrix
- 8.3. Correlation
- 8.4. Heatmap of Correlation Matrix
- 8.5. Non-linear Relationship
- 8.6. Hypothesis Testing
Chapter 9: Identifying and Understanding Groups
- 9.1. Introduction
- 9.2. Clustering
- 9.3. Hierarchical Clustering
- 9.4. K-Means Clustering
Chapter 10: Next Steps
- 10.1. What’s More?
- 10.2. EDA for Text Data
- 10.3. Model Development and Evaluation
Chapter 11: Final Tasks
- 11.1. Project
- 11.2. Self-study Essay
Chapter 12: Finishing the Course
- 12.1. What We Have Learned
- 12.2. Where to Go Next?
- 12.3. Your Opinion Matters
- 12.4. Congrats! You did it!
Brochure
Press the "Fullscreen" button to view in fullscreen.