Mathematics and Statistics Vol. 9(2), pp. 135 - 143
DOI: 10.13189/ms.2021.090207
Reprint (PDF) (258Kb)


The Varying Threshold Values of Logistic Regression and Linear Discriminant for Classifying Fraudulent Firm


Samingun Handoyo 1,2,*, Ying-Ping Chen 3,4, Gugus Irianto 5, Agus Widodo 6
1 Department of Statistics, Faculty of Mathematics and Natural Science, Brawijaya University, Malang 65145, Indonesia
2 Department of EECS-International Graduate Program, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
3 Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
4 Department of Computer Science, National Chiao Tung University, Hsinchu 30010, Taiwan
5 Department of Accounting, Faculty of Economics and Business, Brawijaya University, Malang 65145, Indonesia
6 Department of Mathematics, Faculty of Mathematics and Natural Science, Brawijaya University, Malang 65145, Indonesia

ABSTRACT

The aim of the research is to find the best performance both of logistic regression and linear discriminant which their threshold uses some various values. The performance tools used for evaluating classifier model are confusion matrix, precision-recall, F1 score and receiver operation characteristic (ROC) curve. The Audit-risk data set are used for the implementation of the proposed method. The screening data and dimension reduction by using principal component analysis (PCA) are the first step that must be conducted before the data are divided into the training and testing set. After the training process for obtaining the classifier model parameters has been completed, the calculation of performance measures is done only on the testing set where the various constants are added to the threshold value of both classifier models. The logistic regression classifier has the best performance of 94% on the precision-recall, 91.7% on the F1-score, and 0.906 on the area under curve (AUC) where the threshold values are on the interval between 0.002 and 0.018. On the other hand, the linear discriminant classifier has the best performance when the threshold value is 0.035 and its performance value is respectively the precision-recall of 94%, the F1-score of 91.7%, and the AUC of 0.846.

KEYWORDS
Classifier Model, Confusion Matrix, F1-Score, ROC Curve

Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Samingun Handoyo , Ying-Ping Chen , Gugus Irianto , Agus Widodo , "The Varying Threshold Values of Logistic Regression and Linear Discriminant for Classifying Fraudulent Firm," Mathematics and Statistics, Vol. 9, No. 2, pp. 135 - 143, 2021. DOI: 10.13189/ms.2021.090207.

(b). APA Format:
Samingun Handoyo , Ying-Ping Chen , Gugus Irianto , Agus Widodo (2021). The Varying Threshold Values of Logistic Regression and Linear Discriminant for Classifying Fraudulent Firm. Mathematics and Statistics, 9(2), 135 - 143. DOI: 10.13189/ms.2021.090207.