Find Outlier in Datasets using Local Outlier Factor

The Local Outlier Factor (LOF) is an unsupervised algorithm to detect outliers in your dataset. LOF detects outliers based on the local deviation of the density from an sample compared to the samples neighbors. The local density is calculated by the distance between the sample to its surrounding neighbors (k-nearest neighbors). Outliers are samples that … Read more

Find Outlier in Datasets using Isolation Forest

The Isolation Forest is a unsupervised anomaly detection technique based on the decision tree algorithm. The main idea is that a sample that travels deeper into the tree is less likely to be an outlier because samples that are near to each other need many splits to separate them. On the other hand are samples … Read more

Find Outlier in Datasets using the Interquartile Range Method

The interquartile range method is my preferred method to identify outliers because the method itself is easy to understand and I created two functions that can be applied to every pandas DataFrame to create a little PDF report of all numeric features in your dataset. The interquartile range method uses the 5-th and 95-percentile to … Read more