ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Unsupervised feature selection for outlier detection in categorical data using mutual information

Ranga Suri, NNR and Murty, Narasimha M and Athithan, G (2012) Unsupervised feature selection for outlier detection in categorical data using mutual information. In: 2012 12th International Conference on Hybrid Intelligent Systems (HIS), 4-7 Dec. 2012, Pune, India.

[img] PDF
In_Con_Hyb_Int_Sys_253_2012.pdf - Published Version
Restricted to Registered users only

Download (223kB) | Request a copy
Official URL: http://dx.doi.org/10.1109/HIS.2012.6421343

Abstract

Outlier detection in high dimensional categorical data has been a problem of much interest due to the extensive use of qualitative features for describing the data across various application areas. Though there exist various established methods for dealing with the dimensionality aspect through feature selection on numerical data, the categorical domain is actively being explored. As outlier detection is generally considered as an unsupervised learning problem due to lack of knowledge about the nature of various types of outliers, the related feature selection task also needs to be handled in a similar manner. This motivates the need to develop an unsupervised feature selection algorithm for efficient detection of outliers in categorical data. Addressing this aspect, we propose a novel feature selection algorithm based on the mutual information measure and the entropy computation. The redundancy among the features is characterized using the mutual information measure for identifying a suitable feature subset with less redundancy. The performance of the proposed algorithm in comparison with the information gain based feature selection shows its effectiveness for outlier detection. The efficacy of the proposed algorithm is demonstrated on various high-dimensional benchmark data sets employing two existing outlier detection methods.

Item Type: Conference Paper
Publisher: IEEE
Additional Information: Copyright of this article belongs to IEEE.
Keywords: Feature Selection; Outlier Detection; Categorical Data; Mutual Information
Department/Centre: Division of Electrical Sciences > Computer Science & Automation
Date Deposited: 02 Jul 2013 08:35
Last Modified: 02 Jul 2013 08:35
URI: http://eprints.iisc.ac.in/id/eprint/46677

Actions (login required)

View Item View Item