Datasets may have missing values, and this can cause problems for many machine learning algorithms. As such, it is good practice to identify and replace missing values for each column in your input data prior to modeling your prediction task. Generally only numeric columns have outlier but if a categoric feature has a category that is only present in a few samples, this category could also seen as outlier. In my data analysis and data preprocessing process, I use the one-hot encoding to transform the category columns to numeric and then this category with only a few samples would be a task for the dimensionality reduction.
Find Outlier in Datasets using Local Outlier Factor
The Local Outlier Factor (LOF) is an unsupervised algorithm to detect outliers in your dataset….
Find Outlier in Datasets using Isolation Forest
The Isolation Forest is a unsupervised anomaly detection technique based on the decision tree algorithm….
Find Outlier by using the Residual Plot for Regression Problems
A more complex method to find outliers in regression models, compared to using the distribution…
Find Outlier in Datasets using the Interquartile Range Method
The interquartile range method is my preferred method to identify outliers because the method itself…