Improving the Diagnosis of Breast Cancer Using Regularized Logistic Regression with Adaptive Elastic Net

Early diagnosis of breast cancer helps improve the patient's chance of survival. Therefore, cancer classification and feature selection are important research topics in medicine and biology. Recently, the adaptive elastic net was used effectively for feature-based cancer classification, allowing simultaneous feature selection and feature coefficient estimation. The adaptive elastic net basically employed elastic net estimates as the initial weight. Nevertheless, the elastic net estimator is inconsistent and biased in selecting features. Therefore, the regularized logistic regression with the adaptive elastic net (RLRAEN) was used to handle the inconsistency problem by employing the adjusted variances of features as weights within the L 1 - regularization of the elastic net model. The proposed method was applied to the Wisconsin Breast Cancer dataset of the UCI repository and compared to the other existing penalized methods that were also applied to the same dataset. Based on the experimental study, the RLRAEN was more efficient in terms of feature selection and classification accuracy than the other competing methods. Therefore, it can be concluded that RLRAEN is a better method in breast cancer classification.


Introduction
Breast cancer is the world's second leading cause of death among women from cancer. Furthermore, it is one of the deadliest diseases among women. Unfortunately, it spreads faster in the world more than any other cancer disease. Unless it is detected in its early stages, breast cancer can threaten life [1]. Therefore, early diagnosis of breast cancer helps improve the patient's chance of survival. There are many approaches that can be used to diagnose breast cancer using many techniques of machine learning to decide classification, analysis, and prediction [2].
Over the recent decades, researchers have developed a variety of feature selection techniques. These techniques are divided into three groups. The first group is filter approaches. It includes the most common feature selection techniques, in which each feature is evaluated individually, irrespective of how well it performs in the group. The second group is wrapper approaches. It evaluates the feature group selection process using a variety of algorithms. Even though wrapper techniques, such as "forward feature selection" and "backward feature elimination" are more effective in feature selection than filter methods, wrapper methods are computationally very expensive. The embedded methods are the third group, which incorporates the benefits of both the filter and wrapper groups. It contains penalization techniques that can model and select features simultaneously [3]- [5].
"Wisconsin diagnostic breast cancer" (WDBC) dataset was obtained from a digitized image of a breast mass using a fine needle [6]. In the machine learning discipline, there are numerous soft computing approaches that have been used to analyze and classify WDBC. The authors of Kadam et al. [2] indicated that several methods were proposed by many researchers in order to predict breast cancer early. Then, these authors proposed feature ensemble methods based on the so-called Stacked Sparse Autoencoders and Softmmax Regression Model (FE-SSAE-SM model) to classify breast cancer into benign and malignant. The findings revealed that their proposed method gave rise to a useful classification model of breast cancer. Thus, the method is as efficient as other machine learning methods.
One of the most popular penalty-based regularization methods is penalized logistic regression. It is also used to classify and select features. With logistic regression, embedded feature selection, including classification and regularized techniques, are very successful. The logistic regression has received much interest in recent years to conduct both feature selection and classification simultaneously. This incorporates logistic regression with a penalty. With various penalties, a variety of logistic regression models may be employed. Among these penalties is the "Least Absolute Shrinkage and Selection Operator" (LASSO), which is based on L 1 -regularization [7]. The "Smoothly Clipped Absolute Deviation" (SCAD) [8] is another penalty. Besides, the elastic net [9], the adaptive L 1 -regularization [10], and adaptive elastic net techniques [11], [12] are some of the other penalties.
The L 1 -regularization (LASSO) is capable of choosing variables. However, it has three flaws [13], [14]. The first flaw is linked to the number of features chosen by LASSO. The number of features in specific datasets may be much more than the number of observations. Regrettably, LASSO cannot choose more features than the number of observations. In other words, the number of features selected by LASSO is limited by the number of observations. The second flaw is linked to how the features operate. Basically, features operate as clusters or groupings. Each category contains features that are strongly correlated. This is anticipated to be taken into consideration by LASSO when choosing features. That is, it is anticipated to either pick the whole set of strongly associated features (assuming they are really associated with the illness) or leave it all alone (if they are unrelated). Regrettably, LASSO only chooses a feature of each highly correlated group of features relevant to the research. Zou and Hastie [9] developed the elastic net regularization technique to address the first and second limitations. The elastic net technique uses a penalty that is comprised of L 1 -regularization and L 2 -regularization. A bias in feature selection is considered the third flaw since it penalizes all feature coefficients equally. Therefore, LASSO does not enjoy oracle properties [8]. In order to address this issue, Zou [10] devised a novel regularization method named the adaptive LASSO technique, in which different weights are employed to punish each coefficient within the L 1 -regularization penalty. Adaptive weights are used in the adaptive LASSO to punish various coefficients in the L 1 -regularization.
The L 1 -regularization is a popular technique in sparse approaches. One shortcoming of the L 1 -regularization model is that it applies the same amount of the penalty to all features, resulting in inconsistency in the feature selection process [8], [10]. A regularized logistic regression with adaptive elastic net (RLRAEN) is used in this research to enhance feature selection effectiveness. This is performed by using the adjusted variances of features as an initial weight within the L 1 -regularization with the elastic net to properly classify individuals in terms of catching cancer. This weight reflects the importance of each individual feature in certain respects. Experiments are carried out to compare the proposed feature selection technique in this study with other competitor methods.
The remainder of the paper is organized as follows. Section 2 gives a short overview of relevant research on regularized logistic regression techniques. Section 3 introduces the proposed method (RLRAEN). Section 4 presents and discusses the findings of the experimental research designed to assess the efficiency of RLRAEN in comparison to LASSO, elastic net, and adaptive elastic net. Eventually, Section 5 draws conclusions.

Regularized Logistic Regression
The logistic regression is used to model binary outcome variables. In the cancer classification issue, for example, the outcome variable has just two values: 1 for malignant tumours and 0 for non-cancerous tumours. The relationship between the regression equation and the linear combination of the predictor variables is nonlinear in the logistic regression.
To classify the outcome variable, , we assume that is the n-dimensional vector, where each of its is the p-dimensional vector of unknown coefficients. In general, in the logistic regression, the outcome variable has a Bernoulli distribution as (2), and the probability that is equal to 1 given the value of indicated as The likelihood function can be expressed as (3) Then, the log-likelihood function can be written as follows: The logistic regression is a highly discriminative classifier. However, when applying logistic regression to high-dimensional data, the predictions for the regression coefficients are inaccurate since the design matrix is singular. Aside from that, overfitting occurs in high-dimensional datasets, such as gene data sets, because the number of features (genes) is greater than the number of observations. In addition, the estimates produced by it may be negatively affected by multicollinearity [15].
It is possible that additional features may potentially create noise and degrade the classification accuracy from a statistical perspective. Because of this, researchers often seek to use feature selection techniques that may eliminate irrelevant and redundant information in order to enhance classification accuracy when developing classification models. Furthermore, one of the strategies applicable in classification in the case of high dimensionality is the penalized logistic regression technique, which is utilized to eliminate the issue of high dimensionality while simultaneously improving the accuracy of classification [16]. Despite the fact that penalization techniques are frequently employed in practice in the situation of high-dimensionality, Doerken et al. [17] have shown that these approaches can be effectively applied to data with few dimensions.
When a positive penalty component is introduced to the log-likelihood function, it is possible to drive certain coefficients to zero, resulting in a sparse solution. This is called "penalized logistic regression" (PLR) or " regularized logistic regression" (RLR) because "Regularization" is another term for this procedure. Thus, if there are too many features in the logistic model, a penalty term is included in its equation using the PLR technique. Penalizing the coefficient in this manner will decrease the coefficient values to zero. This means that the less significant features are almost equivalent to zero or precisely zero.
The regularized log-likelihood is written as (5) where indicates the log-likelihood as (4), indicates a regularization term, and is a control parameter. Then the RLR of (5) is reduced with respect to the control parameter to get the estimates of the coefficients. This punishment reduces the variances of the estimates and imposes them to be biased, leading to enhanced prediction accuracy [18]. In classification and feature selection applications, these penalizing (regularizing) techniques belong to the family of embedded feature selection approaches that are often utilized [19].
Without loss of generality, it is assumed that the features are standardized, and, and the outcome variable is centered, . As a result, the intercept is not penalized. The estimation of is done by utilizing LASSO as follows. (6) where is the control parameter. When , (6) is minimized to the MLE estimator. As , the penalization imposes all features to be zero.
The elastic net is another effective penalized technique that is utilized in the process of feature selection. It was proposed by Zou and Hastie [9] in order to address the first and second shortcomings of LASSO, respectively. The elastic net is a combination of L 2 -norm and L 1 -norm that is used to deal with the situation of highly correlated features as well as feature selection all at the same time. The RLR with the elastic net can be expressed as follows: It is easy to observe from (7) that the elastic net estimator relies on two control parameters and whose possible values are non-negative. Equation (7) provides us with a solution to a penalized logistic regression problem. The adaptive LASSO (ALASSO) method was originally proposed by Zou [10] to tackle the third LASSO's drawback by substituting the L 1 -regularization with a re-weighted version. In other words, Zou [10] re-weighted the L 1 -regularization coefficients. For weighting, Ridge, LASSO, or other shrinkage methods λ may be used. LASSO is employed as an initial estimator for the coefficients in this work, and the LASSO obtained from a first stage is used as such. In penalized logistic models using ALASSO, the ALASSO penalized logistic model is written as (8) where and is an initial estimate for each estimated utilizing the LASSO method. Here, we set , for simplicity. Other penalized regression techniques that are similar to the elastic net method and have the capacity to accomplish grouping effect have been suggested, such as adaptive elastic net methods [11], [12], in which the authors provided two adaptive elastic net estimators. They included the adaptive weight into the L 1 -regularization when they were using the elastic net. However, in terms of adaptive weights, two adaptive elastic net (AElastic) methods vary from one another. Using the elastic net estimator, Zou and Zhang [11] build the adaptive weight. On the other hand, Ghosh [12] constructs the adaptive weight using the least squares estimator. For fixed , the regularized logistic regression using AElastic of can be expressed as follows: where is the adaptive weight generated by the initial estimator for some positive constant . The coordinate descent method is capable of providing a reliable solution to (6)-(9) [18].

The Proposed Method
It has been shown that the elastic net technique performs well when the correlations between each pair of variables are extremely strong. However, El Anbari and Mkhadri [20] observed that the reliability elastic net technique lowers when the absolute correlation between features is not high . Elastic net has another issue in that it ignores the correlation structure of features [21]. Zou and Zhang [11] also showed that the elastic net does not possess the oracle property and that the grouping effect issue continues to be a concern. These issues that exist with the elastic net may be addressed by using the adaptive elastic net, which was proposed by Ghosh [12] and Zou and Zhang [11]. The adaptive elastic net combines the L 2 -regularization with the adaptive LASSO. It is critical when using an adaptive elastic net that the initial weight is selected correctly. This is done so that features may be more accurately selected while also ensuring that classifiers are accurate. The regularized logistic regression with AElastic net (RLRAEN) is our approach that uses the adjusted variances of features as initial weights within the L 1 -regularization of the elastic net model for each feature, as shown [22].
The p-dimensional vector of features can be expressed as follows: (10) where is the adjusted variance of a feature that is defined as (11) where denotes the variance of feature in class , is the weight of class or prior probability. In this research, , and . The used weight in this research provides the feature with a low value of adjusted variance with a relatively large weight. On the other hand, it provides the feature with a high value of adjusted variance with a small weight. In this case, the L 1 -regularization can suppress inconsistency in feature selection. After ensuring that each feature has a suitable weight, the RLRAEN may use it to find related features efficiently and correctly. The RLRAEN implementation algorithm is stated in the following Algorithm. The coordinate descent technique may be used to well solve the RLRAEN.

Data Description
The proposed method employed in this study has been applied to the publicly available breast cancer Wisconsin (Diagnostic) medical dataset (WDBC). This dataset was obtained from the UCI Machine Learning dataset repository [23]. The WDBC was created by Dr. William H. Wolberg. It includes the records of 569 cases, 357 of which represent benign breast cancer and 212 cases represent malignant breast cancer. In addition to the ID number of each record and the diagnosis of breast cancer (benign and malignant), the WDBC consists of 30 real-valued features whose values were measured from a digital image of a breast mass using a special kind of needle aspirate. The values of these features represent the characteristics of the cell nuclei that appear in the image. Ten of the features are measurements of Radius, Texture, Perimeter, Area, Smoothness, Compactness, Concavity, Concave points, Symmetry, and Fractal dimension that were estimated for each cell nucleus [24], [25].

Performance Evaluation
In this study, the performance of RLRAEN in terms of prediction is assessed. Following that, a comparison with various sparse techniques is carried out. Three performance measures are included in the comparison, and they are assessed on the above dataset. These measures are classification accuracy (CA), sensitivity (SEN), and specificity (SPE) [26], [27]. (13) (14) (15) TP and FP indicate the number of true and false positives, respectively, while TN and FN denote the number of true and false negatives, respectively. The better the classification performance is, the higher the values of the utilized assessment criteria are. The paired t-test is used to ensure that the promised improvements have a high probability of being correct and consistent.

Experimental Setting
The proposed technique (RLRAEN) employed in this study demonstrated its effectiveness through comparative experiments with three different techniques (LASSO, elastic net, and adaptive elastic net). These techniques, along with our method, are applied to the dataset presented above. Then, cross-validation (CV) is performed on each dataset by randomly partitioning it into two subsets: the training subset, which contains 70% of the data, and the testing subset, which contains 30% of the data. 10-fold cross-validation was performed using the training subset to obtain the best values of and . The experiment was performed a hundred times, and the average of the results was used as the final value. All the tuning parameters have a range of [0,100]. All the techniques used were implemented in the programming language R utilizing the "glmnet" package.

Experimental Results
To evaluate the proposed method (RLRAEN) employed in this study, we compare it with LASSO, elastic net, and AElastic net. All of these methods were applied to the WDBC dataset. The average number of features selected by each method (# features), classification accuracy, sensitivity, and specificity were calculated for both the training and testing subsets of the WDBC dataset, as is shown in Tables 1 and 2. The standard deviation for the relevant value is given in parentheses. λ average accuracy of another method". The alternative hypothesis is that "the mean of average accuracy of proposed method does not equal the mean of average accuracy of another method". The column labeled "improvement" provides the relative improvement achieved by the proposed method (compared to the other methods) in mean of average accuracy. The summary of the results is presented in Table 3, which shows that RLRAEN significantly performs better than the rest of the methods. The outcomes of the study show that the proposed method, RLRAEN, outperforms some of the contemporary classifiers. In general, when compared to the other competitor techniques, the classification process of the proposed adaptive regularized technique offers the best overall classification process in terms of classification accuracy, sensitivity, and specificity. This indicates that RLRAEN takes into consideration the relative importance of the features.

Conclusions
In this study, the findings obtained by applying the RLRAEN method to the Breast Cancer dataset were compared with the findings obtained by applying the other three techniques (LASSO, elastic net, and adaptive elastic net) to the same dataset. As a result, it could be concluded that the proposed method achieved better and more efficient results for classification and feature selection than the other methods. Therefore, RLRAEN seems to be an appropriate feature selection and classification technique, and thus it may be used in other cancer-related datasets.