How to find Outlier in Datasets?

Datasets may have missing values, and this can cause problems for many machine learning algorithms. As such, it is good practice to identify and replace missing values for each column in your input data prior to modeling your prediction task. Generally only numeric columns have outlier but if a categoric feature has a category that is only present in a few samples, this category could also seen as outlier. In my data analysis and data preprocessing process, I use the one-hot encoding to transform the category columns to numeric and then this category with only a few samples would be a task for the dimensionality reduction.

How to find Outlier in Datasets?

Find Outlier in Datasets using Local Outlier Factor

Find Outlier in Datasets using Isolation Forest

Find Outlier by using the Residual Plot for Regression Problems

Find Outlier in Datasets using the Interquartile Range Method