Application of Fuzzy Linear Regression with Symmetric Parameter for Predicting Tumor Size of Colorectal Cancer

The colon and rectum is the final portion of the digestive tube in the human body. Colorectal cancer (CRC) occurs due to bacteria produced from undigested food in the body. However, factors and symptoms needed to predict tumor size of colorectal cancer are still ambiguous. The problem of using linear regression arises with the use of uncertain and imprecise data. Since the fuzzy set theory’s concept can deal with data not to a precise point value (uncertainty data), this study applied the latest fuzzy linear regression to predict tumor size of CRC. Other than that, the parameter, error and explanation for the both models were included. Furthermore, secondary data of 180 colorectal cancer patients who received treatment in general hospital with twenty five independent variables with different combination of variable types were considered to find the best models to predict the tumor size of CRC. Two models; fuzzy linear regression (FLR) and fuzzy linear regression with symmetric parameter (FLRWSP) were compared to get the best model in predicting tumor size of colorectal cancer using two measurement statistical errors. FLRWSP was found to be the best model with least value of mean square error (MSE) and root mean square error (RMSE) followed by the methodology stated.


Abstract
The colon and rectum is the final portion of the digestive tube in the human body. Colorectal cancer (CRC) occurs due to bacteria produced from undigested food in the body. However, factors and symptoms needed to predict tumor size of colorectal cancer are still ambiguous. The problem of using linear regression arises with the use of uncertain and imprecise data. Since the fuzzy set theory's concept can deal with data not to a precise point value (uncertainty data), this study applied the latest fuzzy linear regression to predict tumor size of CRC. Other than that, the parameter, error and explanation for the both models were included. Furthermore, secondary data of 180 colorectal cancer patients who received treatment in general hospital with twenty five independent variables with different combination of variable types were considered to find the best models to predict the tumor size of CRC. Two models; fuzzy linear regression (FLR) and fuzzy linear regression with symmetric parameter (FLRWSP) were compared to get the best model in predicting tumor size of colorectal cancer using two measurement statistical errors. FLRWSP was found to be the best model with least value of mean square error (MSE) and root mean square error (RMSE) followed by the methodology stated.

Introduction
Recently, applied linear statistical models have been used in many fields such as medication, economy, social science and many more [1,11,12]; [1][2][3][4][5][6][7][8][9][10][11][12]. The main objective of linear statistical analysis is to predict the relationship of a respondent variable in terms of predictor variables in a linear function or multiple linear functions. From a linear regression approach, there are some assumptions that need to be fulfilled by researchers [2]. The assumptions are linear relationship, multivariate normality, no auto correlation and also homoscedascity. Statistical linear regression model can be applied only if the dependent variables are continuous and distributed according to a statistical model. For fuzzy data, fuzzy membership function must be in line with fuzzy set theory [3,4].
In constructing a fuzzy and vagueness models, there are three key characteristics of every system model such as complexity, credibility and uncertainty (vagueness) attempt to maximize its useful-ness. The relationship is not fully covered by these three characteristics. On the upside, vagueness tends to reduce complexity and increase credibility of the resulting model when vagueness is the main characteristic in modeling. The vagueness is allowable to solve and estimate by developing methods for each modeling problem [5].
Vagueness in modeling is generally acceptable in publication of a seminal paper by Lotfi A. Zadeh in 1965 [6,14]. In his paper, Za-deh introduced how to deal with vagueness information using a theory of fuzzy. The significance of Zadeh's paper was challenged not only in vagueness modeling theory but also the probability theory. After Zadeh's paper was introduced, modeling in fuzzy area gained more interest especially in prediction of vagueness phenomenon [1]. Tanaka et al (1982) explains fuzziness of respondents or fuzzy uncertainty of dependent variables in fuzzy regression model. Hence, there are three categories data of fuzzy regression model; i). Non-fuzzy input and output. ii). Non-fuzzy input but fuzzy output. iii). Fuzzy input and output.
Recently, there is one model of fuzzy which is commonly used by professional researchers such as FLR. This study aims to provide prediction improvements among fuzzy linear regression and FLRWSP models. The FLR and FLRWSP model have been applied to a colorectal cancer data. The comparison among both models was done based on measurement error values such as MSE and RMSE [7,8,9,10]; [7-10].

Fuzzy Linear Regression Model (Tanaka, 1982)
Statistical analysis is versatile and can be used in any of fields especially with regards of the method of linear regression . Fuzzy linear regression is a fuzzy type of regression analysis in which some elements of the model are represented by fuzzy number. FLRM was an approach explored by Hideo Tanaka in 1982. In his research, the main objective was to obtain fuzzy sets which represent the fuzziness of the system structure from estimated values, meanwhile the conventional confidential interval is related to the observation errors. No assumptions are compulsory in fuzzy model.
The data input and output data vagueness is derived from the existence of fuzzy parameters. In the model, the deviations among data are explained as the vagueness of the system structure expressed by fuzzy parameters [14].
Fuzzy output denoted as Yi = (y i , e i ), where y i is a center and e i is a width of fuzzy triangular diagram. The linear function of fuzzy linear regression as; Where X= [α i , c i ] is a vector of independent variables and A= [A 0 , A 1 … A g ] is a vector of fuzzy coefficient presented in form of triangular fuzzy number. In FLR, there are fitting model can be fine by the data given and solving the linear programming problem. Other than that, the fuzzy parameter can be fined by following linear programming problem:

Fuzzy Linear Regression with Symmetric Parameter (Zolfaghari, 2014)
Fuzzy linear regression with symmetric parameter (FLRWSP) is one of the most used models by professional researcher in fuzzy phenomena. Fuzzy linear regression with symmetric parameter represent some conditions in vagueness and unclear. In the study, the researcher applied fuzzy linear regression to determine the quality of food products especially fried donut. From the science and engineering point of view, the theory of model is useful for conceptual framework and results that can be directly applied in models of systems using fuzzy approach and recent development in fuzzy logic [15]. If Target function is defined in a symmetric condition of triangular fuzzy number as follows:

Structure and Procedure of a Hybrid Model
The structure framework is based on fuzzy linear regression with symmetric parameter model proposed by Zolfaghari (2014). Steps to produce the model are illustrated in Figure 1

Results
This study used secondary data consisting of 180 respondents for the estimated models. The dependent variable or outcome of model is tumor size. In actuality, there were twenty five independent variables in binary and continuous values. The software used to harness results were Microsoft excel, matlab and social science package (SPSS). The comparison among both models was done using cross validation statistical technique also known as MSE and RMSE. This comparison was conducted to find the best model in predicting outcome based on its MSE and RMSE values.

Fuzzy Linear Regression with Symmetric Parameter
FLRWSP model was proposed by Zolfaghari (2014). This model evaluated by two performance measures; MSE and RMSE. The performance of the two methods could also be evaluated by measuring degree of fitting (H-Value = 0.5). The model with smallest error would become the best model in predicting tumor size of colorectal cancer. This model was applied for the study in colorectal cancer data. The results for performance measurement of parameter and error are in Table 3 and Table 4.

Conclusions
In statistics, MSE and RMSE of an estimator is one of many ways to quantify the difference between values implied by an estimator and the true values of the estimated quantity. The results for statistical error of measurement showed in Table 5. The results of MSE and RMSE models have been shown in Table 5. The observed Y is taken from total respondent of 180 patients. The value of MSE for fuzzy linear regression (Tanaka) is 277.952 and fuzzy linear regression with symmetric parameter is 275.071. Since, the value of MSE for fuzzy linear regression with symmetric parameter model is the smallest compared to fuzzy linear regression, which is 275.071, it is found that the FLRWSP model is the most appropriate and efficient model in predicting the size of tumour of colorectal cancer faced by patients in 2012.