Course Overview
Data Science Projects with Python will help you get comfortable with using the Python environment for data science. This course will start you on your journey to mastering topics within machine learning. These skills will help you deliver the kind of state-of-the-art predictive models that are being used to deliver value to businesses across industries.
Course Objectives
By the end of the program, participants will be able to:
- Install the required packages to set up a data science coding environment
- Load data into a Jupyter Notebook running Python
- Use Matplotlib to create data visualizations
- Fit a model using scikit-learn
- Use lasso and ridge regression to reduce overfitting
- Fit and tune a random forest model and compare performance with logistic regression
- Create visuals using the output of the Jupyter Notebook
Who Should Attend?
If you are a data analyst, data scientist, or a business analyst who wants to get started with using Python and machine learning techniques to analyze data and predict outcomes, this book is for you. Basic knowledge of computer programming and data analytics is a must. Familiarity with mathematical concepts such as algebra and basic statistics will be useful.
Pre-requisite
None
Course Outlines
- Python and the Anaconda Package Management System
- Different Types of Data Science Problems
- Loading the Case Study Data with Jupyter and pandas
- Data Quality Assurance and Exploration
- Exploring the Financial History Features in the Dataset
- Activity 1: Exploring Remaining Financial Features in the Dataset
- Introduction
- Model Performance Metrics for Binary Classification
- Activity 2: Performing Logistic Regression with a New Feature and Creating a Precision-Recall Curve
- Introduction
- Examining the Relationships between Features and the Response
- Univariate Feature Selection: What It Does and Doesn't Do
- Building Cloud-Native Applications
- Activity 3: Fitting a Logistic Regression Model and Directly Using the Coefficients
- Introduction
- Estimating the Coefficients and Intercepts of Logistic Regression
- Cross Validation: Choosing the Regularization Parameter and Other Hyperparameters
- Activity 4: Cross-Validation and Feature Engineering with the Case Study Data
- Introduction
- Decision trees
- Random Forests: Ensembles of Decision Trees
- Activity 5: Cross-Validation Grid Search with Random Forest
- Introduction
- Review of Modeling Results
- Dealing with Missing Data: Imputation Strategies
- Activity 6: Deriving Financial Insights
- Final Thoughts on Delivering the Predictive Model to the Client