Transform the Target Variable

The Objective Transforming the Target Variable There are three problems that can occur in a machine learning project that we can tackle by transforming the target variable:1) Improve the results of a machine learning model when the target variable is skewed.2) Reduce the impact of outliers in the target variable3) Using the mean absolute error … Read more

Feature Transformation in Machine Learning

In machine learning, feature transformation is a common technique used to improve the accuracy of models. One of the reasons for transformation is to handle skewed data, which can negatively affect the performance of many machine learning algorithms.In this article, you Programming Example for Feature Transformation For this article, I programmed an example to work … Read more

How to Find and Input Missing Values in a Dataset

Datasets may have missing values, and this can cause problems for many machine learning algorithms. As such, it is good practice to identify and replace missing values for each column in your dataset prior to modeling your prediction task. Find Missing Values in a Dataset Finding missing values in a dataset is not very complicated. … Read more

Find Outlier in Datasets using Local Outlier Factor

The Local Outlier Factor (LOF) is an unsupervised algorithm to detect outliers in your dataset. LOF detects outliers based on the local deviation of the density from an sample compared to the samples neighbors. The local density is calculated by the distance between the sample to its surrounding neighbors (k-nearest neighbors). Outliers are samples that … Read more

Find Outlier in Datasets using Isolation Forest

The Isolation Forest is a unsupervised anomaly detection technique based on the decision tree algorithm. The main idea is that a sample that travels deeper into the tree is less likely to be an outlier because samples that are near to each other need many splits to separate them. On the other hand are samples … Read more

Find Outlier in Datasets using the Interquartile Range Method

The interquartile range method is my preferred method to identify outliers because the method itself is easy to understand and I created two functions that can be applied to every pandas DataFrame to create a little PDF report of all numeric features in your dataset. The interquartile range method uses the 5-th and 95-percentile to … Read more