Computer Science and Information Technology Vol. 7(3), pp. 65 - 71
DOI: 10.13189/csit.2019.070302
Reprint (PDF) (447Kb)

Feature Selection in Sparse Matrices

Rahul Kumar *, Vatsal Srivastava , Manish Pathak
MIQ Digital, Bangalore, India


Feature selection, as a pre-processing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. There are two main approaches for feature selection: wrapper methods, in which the features are selected using the supervised learning algorithm, and filter methods, in which the selection of features is independent of any learning algorithm. However, most of these techniques use feature scoring algorithms that make some basic assumptions about the distribution of the data like normality, balanced distribution of classes, non-sparsity or dense data-set, etc. The data generated in the real world rarely follow such strict criteria. In some cases such as digital advertising, the generated data matrix is actually very sparse and follows no distinct distribution. For this reason, we have come up with a new approach towards feature selection for cases where the data-sets do not follow the above-mentioned assumptions. Our methodology also presents an approach to solve the problem of skewness of data. The efficiency and effectiveness of our methods is then demonstrated by comparison with other well-known techniques of statistics like ANOVA, mutual information, KL divergence, Fisher score, Bayes' error, Chi-square, etc. The data-set used for validation is a real-world user-browsing history data-set used for ad-campaign targeting. It has very high dimensions and is highly sparse as well. Our approach reduces the number of features to a significant degree without compromising on the accuracy of the final predictions.

Feature Selection, Sparse Matrices, Filters Methods

Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Rahul Kumar , Vatsal Srivastava , Manish Pathak , "Feature Selection in Sparse Matrices," Computer Science and Information Technology, Vol. 7, No. 3, pp. 65 - 71, 2019. DOI: 10.13189/csit.2019.070302.

(b). APA Format:
Rahul Kumar , Vatsal Srivastava , Manish Pathak (2019). Feature Selection in Sparse Matrices. Computer Science and Information Technology, 7(3), 65 - 71. DOI: 10.13189/csit.2019.070302.