In this course, we are going to focus on pre-processing techniques for machine learning projects.
What is pre-processing?
Pre-processing is the set of manipulations that transform a raw dataset to make it used by a machine learning model.
Why is pre-processing useful?
Pre-processing is necessary to make our data suitable for some machine learning models, to reduce the problem dimensionality, to better identify the relevant data, and to increase model performance. It's the most important part of a machine learning pipeline and it's strongly able to affect the success of a project. In fact, if we don't feed a machine learning model with the right-shaped data, it won't work at all.
Is pre-processing a good skill to have?
Definitely yes. Sometimes, aspiring Data Scientists start studying neural networks and other complex models and forget to study how to manipulate a dataset in order to make it used by their models. So, they fail in creating good models and only at the end they realize that good pre-processing would make them save a lot of time and increase the performance of their models. So, handling pre-processing techniques is a very important skill.
What's the purpose of a course based only on data pre-processing?
Data pre-processing is the most important part of a machine learning pipeline. Including these lessons inside a larger machine learning course would reduce the perceived value of such topics. Some people think that pre-processing is boring and useless and start with machine learning without caring about how to manage data for their model. That's a great mistake because they don't understand how pre-processing can make their models produce better results. That's why I have created an entire course that focuses only on data pre-processing.
What will I learn with this course?
Completing this course you will learn the basic principles of Data-preprocessing and its applications in Python. You'll learn how to fill the blanks in a dataset, how to encode the categorical variables and several types of transformations for the numerical features. You're going to learn how to use Python's Pipelines and how to perform filter-based feature selection and oversampling. Every lesson is made by a brief, theoretical introduction followed by a practical example in Python programming language using Jupyter notebooks.