Supervised Machine Learning Classifiers for Diversity Combined Signals in 6G Massive MIMO Receivers

Support Vector Machine (SVM) is a statistical learning tool that was initially developed by Vapnik in 1979 and later developed to a more complex concept of structural risk minimization (SRM). SVM, as a supervised machine learning tool, is playing an increasing role in applications to detection problems in various engineering problems, notably in statistical signal processing, pattern recognition, image analysis, and 6G wireless communication networks. In this paper, SVM is applied to signal detection in 6G communication systems in the presence of channel noise in the form of fully developed Rayleigh multipath fading and receiver noise generalized as additive color Gaussian noise (ACGN). The structure and performance of SVM in terms of the bit error rate (BER) metric is derived and simulated for these advanced stochastic noise models and the computational complexity of the implementation, in terms of average computational time per bit, is also presented. The performance of SVM is then compared to conventional M -ary signaling optimal model-based detector driven by M -ary phase shift keying (MPSK) modulation. We show that the SVM performance is superior to that of conventional detectors which require as much as 7 bits-coding ( M ≥ 128) to produce comparable results to those of SVM. Finally, the SVM-based detector is implemented in an uplink SIMO system using both Equal Gain Combiner (EGC) technique and Root Mean Square Gain Combiner (RMSGC) technique in which the later technique will be proven to be superior to the earlier.


Introduction
Support Vector Machine (SVM) is a recent class of statistical classification and regression techniques getting an increased attention on its application to classification problems in various engineering areas. SVM is based on the statistical learning theory initially developed by Vapnik [1] in 1979 and later developed to a more complex concept of structural risk minimization (SRM). SVM is formulated on the structural risk minimization (SRM) principle which minimizes an upper bound on the generalization error, as opposed to the classical empirical risk minimization (ERM) approach which minimizes the error on the training data and is embodied in statistical learning.
In a broad sense, two classes of classifiers are widely used in the literature: (1) model-based classifiers such as the maximum likelihood (ML) and maximum a posterior (MAP) detectors and (2) boundary-based classifiers such as support vector machine, neural networks and fuzzy logic.
SVM claims to guarantee generalization, i.e., the decision rules reflect the regularities of the training data rather than the incapability of the learning machine.
SVM has been widely used in solving classification and function estimation problems due to its many attractive features and promising empirical performance with many successful applications in synthetic aperture radar image classification [2] and pattern recognition. Recently, SVM has been introduced to digital communication systems as a new method for channel equalization [3] - [5] and has proved to be very effective in overcoming intersymbol interference (ISI) and co-channel interference (CCI). SVM was also applied for the equalization of burst time division multiple access (TDMA) transmission [6]. To the best of our knowledge, SVM has not been implemented yet for receiver detection in digital communication systems in the presence of advanced additive colour receiver noise and multiplicative channel fading noise. Notable exceptions are the initial work of Dubois and Abdel-Latif [7] who applied SVM to OOK-infrared channels in a local fading environment with partially developed multipath fading and additive white Gaussian receiver noise (AWGN), and the work of Mokbel and Hashem [8] who applied SVM to a BNRZ detector (sampler and comparator) using multiple samples per binary period in the presence of AWGN in wire-line communication systems.

A. Modulation Scheme
When making a decision on the choice of model-based or boundary-based classifiers, we must take into consideration the driving modulation scheme. M-ary phase shift keying (MPSK) is increasingly being adopted as a physical layer modulation technique for cellular systems, most notably, 3G UMTS (3 rd Generation Universal Mobile Telecommunications System) which uses QPSK (quadrature phase shift keying), and 2.5G EDGE (Enhanced Data Rates for Global Evolution) which uses 8-PSK for high data rates (replacing the classical GMSK (Gaussian minimum shift keying)) [9].
MPSK has a higher bit rate than BPSK, yet it needs a more complex receiver, its carrier recovery process is much more complicated, and most importantly, it has a worse bit error rate (BER) performance (but better symbol error rate). Since SVM is essentially a binary classifier, it is only logical to apply SVM to MPSK to improve its BER performance.

B. Stochastic Noise Model
The Rayleigh fading channel is widely assumed in the literature for wireless systems, especially for mega-and macro-cells in the absence of dominant line of sight [9]. The fading envelope obeys the scattering stochastic model A k is the random amplitude of the k th scatterer, k φ is the random phase of the k th scatterer assumed to be uniformly distributed between [0, 2π), and L is the random number of scatterers in the channel. When L is sufficiently large so that the central limit theorem (CLT) holds and the scattered field is approximately a circular complex Gaussian random variable, we say that the fading is fully developed. Even for statistically dependent scatterers, the scattered field would still be asymptotically circular Gaussian hinging on the fact that the sequence of scattering random variables satisfies the α-mixing property or the conditions of the Lindeberg-Feller CLT. A simple random variable transformation results in the fading envelope γ L asympotically Rayleigh distributed (with parameter ( ) In general, the received wireless QPSK signal is corrupted by two types of noises: (1) channel fading multiplicative noise L γ and (2) receiver additive colour Gaussian noise (ACGN) which characterizes more complicated channel and receiver interference. This stochastic model is widely used in wireless optical communication systems [10] in the presence of interfering ambient or incandescent/fluorescent light and non-ideal photo-detectors. Technically, these noise models respectively correspond to shot noise and "random telegraph signal" or laser-phase noise. Most commonly, ambient light induces shot noise in the photo-detector of optical receivers.
By studying colour noise, we are assuming a severe noise environment. Without loss of generality, we will consider the autocorrelation function of a random telegraph signal 2 ( ) exp ( 2 ), where 2ε is the peak-to-peak noise amplitude and λ is the rate of the influencing underlying Poisson point process [10]. The power spectral density (PSD) of the noise is given by which has a de-emphasis filter-like characteristic.

Support Vector Machine
In this section, we provide a succinct introduction to the SVM approach. The reader is referred to the initial work of Vapnik [1] and the book of Christianini [11] for more in-depth treatment of the SVM theory.
The relation between the capacity of a learning machine and its performance is ruled by a set of boundaries, which is referred to as the bound on the generalization performance. Statistical pattern recognition techniques face two problems: the identification problem and the parameters estimation problem. The identification problem is the problem of determination of the degree of freedom or complexity of the model and is generally the more complex 322 Supervised Machine Learning Classifiers for Diversity Combined Signals in 6G Massive MIMO Receivers problem [12]. The estimation problem is how to get an optimal estimate of the model parameters regarding the training data set.
Let us consider a mapping : ℜ ↦ , which maps the training data from ℜ to a higher Euclidean space H, that may have an infinite dimension. In this high dimension space, the data are linearly separable, hence linear SVM formulation above can be applied for any type of data [2]. In the SVM formulations, the training data only appear in the form of dot products x.x. These can be replaced by dot products in the Euclidean space H, i.e., ϕ(.).ϕ (.).
The dot product in the high dimension space can also be replaced by a kernel function. By computing the dot product directly using a kernel function, one avoids the mapping Φ(x). This is desirable because H has possibly infinite dimensions and Φ(x) can be tricky or impossible to compute. Using a kernel function, a SVM that operates in infinite dimensional space can be constructed [11].
Given a training set of N data points {y k , x k ) N , where x k denotes the k th input pattern and y k the k th output pattern, the SVM aims at constructing a decision function or classifier where w is the weight vector in the reproducing kernel Hilbert space (RKHS), α k are support values (Lagrangian multipliers), b is the bias term, and the kernel function For every new test data, the kernel functions for each SV (support vector) need to be recomputed.
For any kernel function suitable for SVM, there must exist at least one pair of {H, Φ}, such that (5) is satisfied. The kernel that has these properties is said to obey the Mercer's condition, i.e., for any g(x) with finite L 2 norm, By choosing different kernel functions, the SVM can emulate some well known classifiers [13], as shown in Table 1. While standard SVM solutions involve solving quadratic or linear programming problems, the least square version of SVM (LS-SVM), which has been adopted for this research, corresponds to solving a set of linear equations. In LS-SVM, the Mercer's condition is still applicable. Hence several types of kernels can be used, yet the RBF is the adopted one since it gives a Gaussian distribution for the errors in the feature space yielding an optimal estimate of the support values [14]. Many reasons could be stated for preferring LS-SVM over other models of SVM, yet the most important one is that LS-SVM is an iterative method that could be used to solve large scale problems with robustness in the sense of the choice of the regularization and smoothing parameters. Moreover, it offers a fast method for obtaining classifiers with good generalization performance in many real-life applications [15].
So far, the formulation of SVM was based on a two-class problem (SVM is essentially a binary classifier). Various schemes can be applied to the basic SVM algorithm to handle the M-class pattern classification problem. Some of these schemes [2], [16], for solving the multi-class problem are:  Using M one-to-rest classifiers.  (8) subject to the equality constraint where γ is the regularization factor and k ξ is the difference between the output y k and discriminant function f(x k ). Using standard techniques, the Lagrangian for (8) and (9) is where α k are the Lagrangian multipliers corresponding to (9). The saddle point is obtained from max min .

SVM Based QPSK Detector
Since QPSK modulation is widely used in existing wireless communication systems [9], we will describe the SVM-based QPSK detector. As shown in Fig. 1, the QPSK receiver locally generates sine and cosine carriers S I [n] and S Q [n], referred to as in-phase and quadrature signals, respectively. Provided that a coherent receiver system is employed, both the in-phase and quadrature signals can be recovered exactly, allowing the system to transmit twice the amount of signal information at the same carrier frequency that could be achieved by a single oscillator. S I [n] and S Q [n] are then fed into the SVM classifier in order to classify the received signal into a parallel bit stream that will be next converted into a series bit stream representing the received desired data.
A similar diagram can be adopted for higher classes of MPSK where the only change will be in the number of local oscillators (log 2 M).

Massive Multiple Input Multiple Output (m-MIMO)
Massive MIMO systems can be defined simply as follow: Given an arbitrary wireless communication system, we consider a link for which the transmitting end, as well as the receiving end, are equipped with large scale antenna array. The idea behind MIMO is that the signals on the transmitting (TX) antennas at one end and the receive (RX) antennas at the other end are combined in such a way that the quality (bit-error rate or BER of the communication for each m-MIMO user will be improved [17]. Such an advantage can be used to increase both the network's quality of service and the operator's revenues significantly. The channel model adopted for the m-MIMO system will be the multipath Rayleigh flat fading with the presence of Additive White Gaussian Noise (AWGN) and Co-Channel Interferer (CCI) in which the code word encompasses M-fading blocks. The complex fading gains are constant over one fading block but are independent from block to block. We model the CCI as a white Gaussian process whose variance is constant across one block and changes independently from block to block. We further adopt a simple two state model for the CCI noise variance. The additive noise samples are, therefore, independent samples of a zero-mean complex Gaussian random variable. The variance of the noise is assumed to be constant across the fading block but changes independently from block to block. While this model may not be very realistic. In fact, this model is widely used in the literature to model the CCI as mentioned in [18] and [19].
There are many different schemes to implement the MIMO receiver, yet in this paper only two will be adopted to simulate and compare: Equal Gain Combiner (EGC) and Root-Mean-Square Gain Combiner (RMSGC). The first scheme (EGC) receives the signal on N Antennas and magnifies it by a gain G (that is equal for all N Antennas), then it combines all received signals from the N Antennas and passes them to the detector as shown in figure 2(a). The second scheme (RMSGC) also receives the signal on N Antennas and squares each one of the received signals, then all the signals are combined and averaged, then passed through a square-root device and submitted to the detector as shown in figure 2(b).
The new proposed detector which is the SVM-based detector has proved to be a better detector than the classical ones. Once again the SVM-based detector has proven to be a better one from a QoS point of view. The outcome presented in figure 3 is the result of a simulating each of the EGC and RMSGC schemes with Maximum Likelihood (ML) detector and SVM-based detector.

Simulation Results and Discussions
For simulation purposes, Matlab is used due to its enhanced mathematical capabilities and engineering-based structure. The LS-SVM model was simulated using Matlab code on a 1.7 GHz Pentium IV computer with 256 MB RAM, to ensure that the comparison with classical detectors is fair since it is the main scope of this simulation. Without loss of generality (wlg) and for the purpose of simulation, we assumed P dif = 1 (Rayleigh driving parameter) and λ = ε = 1 (colour noise parameters). In order to take full advantage of the SVM scheme, we consider several samples of the QPSK signal in the bit period. This offers a generalization since SVM is applied in a wider space. The results of the simulated LS-SVM-based QPSK system are shown in Fig. 3. We observe that the SVM-based detector outperforms the classical ML-based detector for low SNR, while for high SNR, both systems seem to produce similar results and converge at 17.35 dB.
Yet this superior performance occurs at cost of processing time as shown in the Table 2. This drawback is expected because the SVM method is a block-data based method. After QPSK, we simulated higher levels of M-ary PSK in order to assess the performance of the SVM-based detector and compare it with the classical detectors for M = 8, 32, 64 and 128. The results of the simulation are shown in Fig. 4. Again, the SVM outperformed the ML detector for low SNR and it achieved the ML performance asymptotically. For large SNR, there are diminishing differences in the BER curves. For high SNR, the BER is very weak and cannot be measured with sufficient precision for both methods, so a much larger training data block must be used. The converging dB levels of the SNR for both SVM and ML detectors are displayed in Table 3. We also observe from Fig. 4 that for M > 64 (M = 128 or 7-bits coding) the SVM-based system does not give much better results than that of the classical systems. For the MIMO system, first we need to establish the better technique for the simulation. The two techniques, which are EGC and RMSGC have been simulated in a SIMO system using different number of receiving antennas. The outcome presented in figure 5 is self-explanatory; it shows definite supremacy of the RMSGC technique over the EGC technique for low SNR; however, for relatively high SNR the performance of the two techniques seems to converge. At first glance, it looks like the more number of implemented antennas, the better Quality-of-Server it could be achieved. However, after conducting a simulation, the outcome shown in figure as shows that this is not accurately the case, and that behavior is due to the assumption that has been taken to suppress the additive noise n(t),yet it had a limited effect on the transmitted signal till the number of simultaneously transmitted signals exceeds a certain limit after which the effect of the "added additive noise" on the transmitted signals will be completely destructive and it will deteriorate the effect of the multiple antenna phenomena. Figure 6 shows clearly that the optimum number of receiving antennas is eight since the performance of the system for a QoS viewpoint shows that the system with eight receivers is relatively the best choice and even though the system with ten receiving antennas gives a relatively better performance, it is not cost-efficient to pay for two more antennas for that relatively small enhancement in the behavior of the system.
To prevail over the influence of the "Added Additive Noise", we can adopt the Orthogonal Frequency Division Multiplexing (OFDM). OFDM inherently provides frequency diversity over sub-channels (or tones), which offers an opportunity for both interference averaging and interference avoidance in the frequency domain [20]. The main motivation for using OFDM in a MIMO channel is the fact that OFDM modulation turns a frequency-selective MIMO channel into a set of parallel frequency-at MIMO channels. This renders multi-channel equalization particularly simple, since for each OFDM-tone only a constant matrix has to be inverted. The user signals pass through IFFT, parallel-to-serial (P/S) conversion, and cyclic prefix insertion (+CP) at the transmitter side, and the corresponding inverse processing at the receiver. The operations of CP insertion and removal make the effective channel responses into circulant matrices as indicated in Figure 7, which can be diagonalized by normalized (unitary) IFFT and FFT matrices. Again, to study the behavior of the MIMO-OFDM, we need to show its performance compared to the MIMO system previously mentioned in the paper. The outcome presented in figure 8 shows the performance of a SIMO system without the use of OFDM technology for 8 and 16 receiving antennas, and figures 9 shows the performance of the MIMO-OFDM system for the 2x8 and 2x16 systems.  On the other hand, the results shown in figure 9 show that MIMO-OFDM overcome this problem and combat the deterioration effect of the accumulating additive noise especially the Co-Channel Interference (CCI).

Conclusions
In this paper, we first applied SVM to single-input-single-output MPSK detection in wireless systems with severe noise conditions modeled by Rayleigh fading channel noise and colour receiver noise. SVM is found to be a learning machine suitable for wireless communication with the ability to handle data coming out from a relatively hostile wireless channel at a low SNR with a considered superiority to the classical ML detector at the cost of relatively longer processing time, especially for higher classes. For large SNR, the performance of the SVM detector was similar to that of the ML detector.
Then we simulated the SVM for a MIMO system. The main part was to introduce the SVM-based detector and prove its superiority over the classical detectors adopted in the general design and implementation of the MIMO system. However, in order to improve its supremacy, we first wanted to decide the best schematic to implement the MIMO system; we compared the reputed EGC with the RMSGC and proved that the later outperform. Then we had to decide on the optimum number of antennas to be used at the receiver, after simulating the system for different numbers of antennas, the proper choice was taken to be eight. Finally, the system faced the problem of additive noise, which will restrict the enhancement of the system, and the solution was found by employing the technology of OFDM which results a splendid outcome.
One major weakness of SVM that could be realized from Fig. 4 is that it does not give much better results than the ML detector for classes above 64. This is expected since SVM is essentially a binary classifier. However, the approaches mentioned in section III to extend the SVM to multi-class detection did give reasonable results for classes up to 64.
We also showed that conventional detectors require as much as 7 bits-coding (M ≥ 128) to produce comparable results to those of SVM. Therefore, SVM can be successfully applied to M-ary signaling with 1 to 6 bits-coding. Since QPSK and 8-PSK are widely used in existing wireless communication systems, notably in cellular UMTS and land mobile satellite, we expect this research to trigger a newer generation of SVM-based wireless systems.
Although the simulations were conducted for a medium-scale antenna array system due to computational limitations, the same results and conclusions can be extended to large-scale antenna systems.
As future work, we propose to adopt one of the many pre-designed SVM chips [21] and implement a real-time system to compare results with the simulation outputs. As processors technology becomes faster SVM will be able to meet real-time computational requirements for nowadays high-speed data communications.
We also propose to apply relevance vector machine (RVM) [22] to MPSK detection for Giga-bits Ultra Wide Band (UWB) wireless systems [23,24]. RVM has an identical functional form to SVM and has been demonstrated to have a comparable generalization performance to SVM while requiring dramatically fewer kernel functions, thus enjoying faster computation.