Reducing Approximation Error with Rapid Convergence Rate for Non-Negative Matrix Factorization (NMF)

Non-Negative Matrix Factorization (NMF) is utilized in many important applications. This paper presents development of an efficient low rank approximate NMF algorithm for feature extraction related to text mining and spectral data analysis. NMF can be used for clustering. NMF factorizes a positive matrix A to two positive matrices W and H matrices where A = WH . The proposal uses k-means clustering algorithm to determine the centroid of each cluster and assigns the centroid coordinates of each cluster as one column for W matrix. The initial choice of W matrix is positive. The H matrix is determined with gradient descent algorithm based on thin QR optimization. The performance comparison of the proposed NMF algorithm is illustrated with results. The accurate choice of initial positive W matrix reduces approximation error and the use of thin QR algorithm in combination with gradient descent approach provides rapid convergence rate for NMF. The proposed algorithm is implemented with the randomly generated matrix in MATLAB environment. The number of significant singular values of the generated matrix is selected as the number of clusters. The error and convergence rate comparison of the proposed algorithm with the current algorithms are demonstrated in this research. The accurate measurement of execution time for individual program is not possible in MATLAB. The average time execution over 200 iterations is therefore calculated with an increasing iteration count of the proposed algorithm and the comparative results are presented.


Introduction
Clustering with heterogeneous data set [1] while retaining the original scale is a very important aspect of analyzing heterogeneous data set. The analysis of NMF is described in this context in the survey paper [2] and in the book chapter reported in [3]. NMF factorises positive matrix A ≡ W H where W and H are also positive matrices. NMF forces W and H to be positive and hence allows only additive combinations. NMF basis images are localised features of the original image [4]. The basis images for vector quantization (VQ) [5] and principal component analysis (PCA) [6] are distorted representation of the complete image. A non-negative factorization can be used for clustering: the data vector a j is assigned to cluster i, if h ij is the largest element in column j of H. Each column of A represents a point in m dimension space. Where A ∈ R m×n , W ∈ R m×k , H ∈ R k×n . The constraint is given by, min which is equivalent to where j = 1, 2. . . . . . . n The authors in [4] uses Singular Value Decomposition (SVD) to detect the most significant input basis vector for which all elements are positive. However, approximate methods are used to suppress the negative values to generate the remaining basis vectors. Negative values are replaced with zeros and SVD is used repetitively to generate all positive basis vectors which is written as W . Each basis vector represents a column in W . The initial choice of W determined by SVD algorithm is not very close to the actual solution as the negative values are replaced with zeroes while computing the initial basis vectors. Multiplicative algorithm is used in conjunction with gradient descent approach to determine H. Alternating least square in [7], and multiplicative algorithm in [4] are the best reported solutions for the problem. NMF is a cutting-edge feature extraction technique. NMF comes in handy when there are a lot of attributes and they're unclear or unpredictable. NMF may create meaningful patterns, subjects, or themes by combining attributes [8] [9]. In text mining, NMF is often used. The same word appears in multiple places in a text document, each with a different meaning [10]. NMF decomposes multivariate data by producing a number of features that the user determines. The coefficients of these linear combinations are non-negative, and each function is a linear combination of the original attribute set [11] [12]. NMF decomposes a data matrix A into the product of two lower rank matrices W and H, yielding a result that is roughly equal to W times H. The initial values of W and H are modified by NMF using an iterative process until the product reaches A. The process ends [13][14] [15] when the approximation error converges or the required number of iterations is reached. An NMF model maps the original data into the new set of attributes (features) discovered by the model during model apply. initialize: W and H non negative. Then update the values in W and H by computing the following, with n as an index of the iteration.
and  [18]. Sparse numerical data is replaced with zeros, and sparse categorical data is replaced with zero vectors. NMF is a commonly used method for data dimensional reduction and feature extraction [19]. The key distinction between NMF and other factorization approaches, such as SVD, is that NMF allows only additive combinations of intrinsic parts,' i.e. hidden features. This is illustrated in, where NMF learns face parts and a face is naturally depicted as an additive linear combination of various parts. Negative combinations, on the other hand, are not as intuitive or natural as positive ones [20][21] [22]. NMF is often used in bioinformatics to find 'metagenes' from expression profiles that are linked to biological pathways. NMF was used to derive trinucleotide mutational signatures from mutations present in cancer genomic sequences, and it was proposed that each cancer type's trinucleotide profile is a positive linear combination of these signatures[23] [24]. For NMF decomposition, a variety of algorithms are available, including the multiplicative algorithms proposed in, gradient descent, and alternating non-negative least squares (ANLS) [25] [26]. ANLS is gaining popularity because it guarantees a stationary point and is a faster non-negative least squares algorithm (NNLS). The resulting decomposed matrices have fewer entries than the original matrix as NMF is a dimension reduction process. This means that a decomposition does not include all of the entries in the original matrix and NMF should be able to accommodate missing entries in the target matrix [27][28] [29]. The authors in [4] have demonstrated that the performance of the final solution depends on the initial choice. The Multiplicative algorithm reported in [4] uses component wise division. If any element of (W T W H) or (W HH T ) becomes zero, Multiplicative algorithm replaces zero by to overcome divide-by-zero problem where is a very small positive number. However, based on the value of the corresponding element in (W T A) or (AH T ), the division by may generate high value which may suppress other values of W and H during normalization. This issue is addressed in this work by using thin QR decomposition [30][31] [32].
This paper aims to reduce the approximation error with rapid convergence rate of the NMF algorithm by addressing the shortcomings of the work reported in [4]. The structure of the paper is as follows. The details of the proposed algorithm are discussed in section 2. The results are presented in section 3 followed by discussion in Section 4. Section 5 concludes the paper.

Proposed NMF Algorithm
The number of significant clusters is denoted by k. The value of k is determined by SVD algorithm from the initial set of points or the input A matrix. The k clusters are determined and the centroid of the clusters are considered as one of the basis vectors based on the number of significant distinct singular values. Thus, k basis vectors are found to form W Matrix and W is positive. This is determined by k-means clustering algorithm. The marix H is determined by thin QR decomposition after the matrix W is determined. W = QR, where Q is an orthogonal matrix, R is an upper triangular matrix and the marix H is expressed by (5) based on W H = A, QRH = A.
The marix H is determined but some of the elements are negative. The negative elements are replaced with zeros and the matrix is termed as H 1 . Then applying thin QR decomposition on H 1 as below.
At this point W 1 is normalised and the value of the error constraint given by equation (1) is computed. The process is repeated till the error is within the acceptable limit. The flowchart of the proposed algorithm is presented in Figure 1.

Results
The proposed algorithm is implemented in MAT-LAB/Simulink environment for randomly generated matrix. The number of significant singular values of the generated matrix is chosen as the number of clusters. The comparison of error and the convergence rate of the proposed algorithm with the existing algorithms is depicted in Figure 2. It is observed that Multiplicative algorithm presented in [4] with different initial choice of W computed by SVD outperforms random initialization of W with respect to approximation error and convergence rate. This is because the chosen W matrix with SVD initialization is more closer to the actual in comparison with the randomly chosen W matrix. Finally, it is observed that the proposed algorithm outperforms all other variations of NMF reported in the literature with respect to relative approximation error and convergence rate.
The accurate measurement of execution time for individual program is not possible in MATLAB. Hence, the average execution of time over 200 iteration with increasing iteration count of the proposed algorithm is measured. A comparison of average execution time for the proposed algorithm is illustrated in Figure 3. It is observed that the proposed algorithm consumes 10% more execution time. All algorithms are of the same order in terms of execution time.

Discussion
In the proposed algorithm, W is better approximated with k-means clustering as no approximation has taken place to sup-press negative numbers. The value of k is chosen by SVD algorithm in the proposal which is one of the major contribution in reducing the error and increasing the convergence rate.

Conclusion
In this work an efficient NMF algorithm is proposed to reduce approximation error and achieve fast convergence rate. The major contribution is the selection of number of clusters by using SVD algorithm which determines the number of column in W matrix. The accurate initial choice of W matrix corresponding to significant singular values provides less error with rapid convergence rate in associate with gradient decent thin QR optimization. Simulation results are illustrated to validate the performance of the proposed NMF algorithm and compared with other NMF variants available in the literature.