Can the Threshold Performance of Maximum Likelihood DOA Estimation be Improved by Tools from Random Matrix Theory?

For direction of arrival (DOA) estimation in the threshold region, it has been shown that use of Random Matrix Theory (RMT) eigensubspace estimates provides significant improvement in MUSIC performance. Here we investigate whether these RMT methods can also improve the threshold performance of unconditional (stochastic) maximum likelihood DOA estimation (MLE).


Introduction
The "pre-asymptotic" performance of traditional stochastic (unconditional) MLE of DOAs for m closely-spaced Gaussian signals immersed in white Gaussian noise using an M -sensor array and a finite number T independent identically distributed (i.i.d.) training samples continues to be the subject of investigations [3][4][5]15] despite being a relatively "old" problem [16]. The main reason, as formulated in [4], is that in the pre-asymptotic domain, "no general non-asymptotic results are [currently] available for the performance evaluation of the ML method, and each problem requires a special investigation".
As an alternative, instead of focusing on the accurate non-asymptotic analysis, one may consider the asymptotic model where both quantities M and T grow without bound, while their quotient converges to a fixed finite quantity: This condition is known as the Kolmogorov asymptotic condition [6], and underpins a field of analysis referred to as Generalized Statistical Analysis (GSA) or Random Matrix Theory (RMT). Of course, in any practical situation, one deals with finite M and T values, and therefore generalized "G-asymptotic" (i.e. asymptotic per the conditions in (1)) results may still not be sufficiently accurate. However, in numerous studies, it has been demonstrated that estimators that are consistent G-asymptotically are more robust in the presence of finite samples T than other estimators which are only consistent for T → ∞, M =constant [6,10,11,18].
Specifically, let us suppose that i.i.d. observations x 1 , . . . , x T of random vector ξ with dimension M are given, and we wish to estimate some value φ(R M ), where φ is a continuous function of the entries of the (true) covariance matrix R M of vector ξ. If T is large and M is fixed and does not change as T grows (i.e. the standard asymptotic case), then as an estimator of φ(R M ), we may take φ(R M ), where andR M is the standard sample covariance matrix for zero-mean data (= 1 T ∑ T j=1 xx H ). Moreover, if x j , j = 1, . . . , T is a set of i.i.d. complex (circular) Gaussian samples (i.e. x j ∼ CN (0, R M )), thenR M and φ(R M ) are not only consistent (in the standard asymptotic framework), but also the maximum likelihood estimate of R M and φ(R M ) respectively [17]. However, this familiar assertion is not in general true under the G-asymptotic condition (1). There, for a wide range of functions φ(R M ), one can find a measurable function ψ of the entries of the matrix R M for which plim But in general, the functions φ(R M ) and ψ(R M ) do not coincide, in which case φ(R M ) is not necessarily a Gconsistent estimate of φ(R M ). However, with the help of function ψ(R M ), one may try to find a measurable function g(R M ) such that plim and where the distribution of normalized difference g(R M ) − φ(R M ) is asymptotically normal. The function g(R M ) is then called a G-estimator [6], and is consistent under condition (1). In [10,11], this RMT methodology has been used to find G-estimates of eigenvalues for a covariance matrix with known multiplicity of its eigenvalues. Specifically, for an M -variate covariance matrix R M , let γ 1 < γ 2 < . . . < γ M be the set of M distinct eigenvalues (1 ≤ M ≤ M ) after accounting for individual multiplicity K m of the M true eigenvalues (i.e. ∑ M m=1 K m = M ). Associated with each eigenvalue γ m , there is a complex subspace of dimension K m . This subspace is determined by an M × K m matrix of corresponding eigenvectors, denoted by E m , such that E H m E m = I Km . Note that this specification is unique up to right multiplication by the orthogonal matrix, and therefore the problem of eigendecomposition for the matrix R M is more convenient to formulate as a problem of estimation of orthogonal projection matrices, defined as given the sample covariance matrixR that has the eigendecomposition To use this eigendecomposition to estimate the orthogonal projection matrices P m , let K m be a set of indices with the cardinality of K m equal to the multiplicity of the eigenvalue γ m , namely K m . The classical (and indeed maximum likelihood for multivariate Gaussian observations) estimator of the m-th eigenvalue and orthogonal projection matrices P m in (6) is given bŷ and where µ m is the m-th solution to the following equation in µ: under the convention µ 1 < µ 2 < . . . < µ M . Thus, the traditional ML estimators forγ M L m andη M L m are not G-consistent estimates, and therefore could potentially be improved within the RMT methodology 1 . "Improved" G-consistent estimates for γ m and P m under the conditions (As1-As4) have been derived by X. Mestre, Theorem 3 in [10]: andμ 1 ≤μ 2 ≤ · · · ≤μ M are the real-valued solutions to the following equation inμ: One can prove (see [10]) that for any fixed M , as T → ∞, we havê and thereforeγ On the other hand, φ m (k) → 0 and ϕ m (k) → 0, which implies thatη G m →η M L m , as T → ∞. Therefore, the RMT methodology provides a unique set of G-consistent estimates forγ G m andη G m that tend to the classical MLE estimates under traditional asymptotic assumptions (M = constant, T → ∞). Specifically, the G-consistent MUSIC (or G-MUSIC) pseudo-spectrum estimatê has been suggested in [9] and demonstrated significant performance improvement compared with the traditional MUSIC functionη M L 1 (θ). Recently in [8], we showed that for closely spaced sources, despite the substantial improvements demonstrated by G-MUSIC (20) with respect to conventional MUSIC, the "threshold" performance of G-MUSIC (where the estimator's mean squared error departs rapidly from the Cramér-Rao lower bound as SNR and/or sample support is reduced) remains significantly worse than performance of the rigorously implemented (via global search) ML DOA estimation. Since the RMT-methodology, and specifically the G-consistent estimates in (15), are able to improve conventional MUSIC performance and differ from the traditional ML eigenvalue and subspace estimates, it is quite legitimate to investigate whether these estimates may be used to improve the threshold performance of ML DOA estimation itself. In this correspondence, we try to address this question.

G-asymptotic ML DOA Estimation (G-MLE) Formulation
For the stochastic (unconditional) ML model under consideration, the set of T i.i.d. data x j ∈ CN (0, R 0 ), j = 1, . . . , T is described by the true/exact covariance matrix R 0 , comprised of m independent planewave sources with direction of arrival θ m , embedded in white noise. Under this model, the traditional ML DOA estimates are found as the global maximum of the stochastic (unconditional) likelihood function (LF) computed for the model covariance matrix R(Ω m ), uniquely specified by 2m + 1 parameters in Ω m (m source DOAs and powers as well as the noise power) and denoted L[R(Ω m )]:Ω or, equivalently, as the minimum of the (scaled) 2 log-likelihood function (LLF) Note that for any given R(Ω m ), the LLF (22) may be viewed as the ML estimate of the function with an unknown R 0 , here represented by its generic (positive definite Hermitian, for T ≥ M ) ML estimate sample covariance matrixR formed from the training data x j . Since the likelihood function (22) is just φ(R M (Ω m )) (per the notation in (2)), under the RMT methodology, we need to construct the G-consistent estimate g(R M (Ω m )) (per the notation in (4)) of the function φ 0 (the portion of φ from (23) dependent on R 0 , which can be simplified to 1 M tr R −1 (Ω m )R 0 ). Yet, according to Lemma 3.1 in [6] (see eqn. 3.10, p. 182), for a very broad class of empirical covariance matrices that embraces the complex Wishart casê under the Kolmogorov asymptotic condition (1), the following property is proven: where E(·) is the expectation operator and CW(M, T, I M ) is the complex Wishart distribution [7]. This property means that for the function φ 0 , we have a special case where its G-consistent estimate g(R M (Ω m )) coincides with its ML estimate φ 0 (R): In essence, the property (25) means that for the multivariate complex Gaussian data case, the (scaled) unconditional log-likelihood function (22) may be treated as the ML estimate and at the same time, the G-consistent estimate of the function φ[R(Ω m ), R 0 ] given in (23).
Lemma 3.1 from [6], and more importantly, its interpretation as G-consistency of the standard log-likelihood function 3 , leaves an apparent contradiction which needs to be resolved. Specifically, since according to Theorem 2 from [10], individual ML estimates of eigenvalues are not G-consistent and can be improved, per (15). At the same time, the sum of all eigenvalues (i.e. the sample covariance matrix trace) is G-consistent, and therefore cannot be improved within the RMT methodology 4 . This contradiction could be resolved by the direct proof that the sum DOA Estimation be Improved by Tools from Random Matrix Theory? of G-consistent individual eigenvalue estimatesγ G m in (15) is strictly equal to the sum of sample eigenvaluesλ k of R M (Ω m ), ie.
According to (15), we therefore have to prove that Theorem 1 Under conditions As1-As3 in [10], the estimate is the G-consistent estimate of the trace of R M , regardless of the "eigenvalue splitting condition" As4.
The proof of this equivalence is shown in Appendix A, and shows that the trace of a sample matrixR is always the G-consistent estimate of the trace of the original (true) covariance matrix R M , even when for some individual eigenvalues of this matrix, their G-consistent estimates may not exist due to As4 not being satisfied.
As an aside for the interested reader, Mestre's assumptions As1 to As3 (the details of which are provided in [10]) impose general conditions on the covariance matrix and eigenvalues which are always meet under our problem with the sample Wishart matrixR M (Ω m ), but assumption As4 may or may not be met. The physical meaning of the As4 condition, also referred to as the eigenvalue splitting condition, is that the asymptotic eigenvalue distribution ofR can be considered as a collection of clusters, each one centered around a different eigenvalue of The size of the support of these clusters depends on the ratio of the number of sensors/antennas to the number of samples (ie. the ratio c = M/T ) from whichR is constructed. If the ratio M/T is high, the clusters with close eigenvalues γ j of R M may merge and be represented by a single estimate with a certain "multiplicity" K. If, on the contrary, the ratio M/T is low, each sample eigenvalueλ k will be associated with a different distinct cluster, well separated from the rest of the sample eigenvalue distribution.
This analysis provides an answer, therefore, to the main question posed in the title of this correspondence: Within the existing RMT framework, "improved" G-consistent estimates of individual eigenvaluesγ G m and bilinear formsη G m given in (15) cannot make the traditional log-likelihood function "more G-consistent", and therefore provide any improvement into the threshold performance of ML DOA estimation, in full correspondence with Lemma 3.1 from [6] shown in (25) above.
To confirm this further and support numerical evaluation, we still may consider an alternative reconstruction of the G-consistent log-likelihood function using the individual estimates (15) in a manner similar to the G-MUSIC derivation in [9]. Assuming that the "eigenvalue splitting condition" As4 from [10] is satisfied for all the (m + 1) different eigenvalues 5 , we may construct the G-LLF (log likelihood function) as where, based on (15), G-consistent estimates of the eigenspace P j in R 0 (R 0 = λ 1 P 1 + ∑ m+1 j=2 λ j P j ) are given bŷ Recall that G-MUSIC was calculated as per (20), and therefore (29)-(30) resemble this construction, but with some differences. First of all, in contrast to standard eigendecomposition, we note that individual eigensubspace estimatesP G j in (30) are no longer mutually orthogonal, though they can be shown to sum to the identity matrix since the weights ρ j (k) in (16) sum to unity. Therefore, if we introduce the diagonal matrix D(j) ≡ diag[ρ j (1), . . . , ρ j (M )], we immediately discover that according to (30), we get Using (15)- (16), for the noise subspace eigenvalues (λ R 1 , . . . ,λ R M −m ), we get According to (15), for signal subspace eigenvalues (r ≥ M − m), we getλ r −μ r = 1 Tλ G r , and therefore we havê Note that forλ r−1 ≪μ r <λ r andμ r → T −1 Tλ r when As4 is strongly met. Therefore the difference between (35) and the expression (see (11) and (32)) that strictly tends to γ M L k in (11) under asymptotic condition (1), is only a difference of the second order (in 1 T ) magnitude.
Therefore, the reconstructed noise subspace eigenvalues inR G M (31) G-asymptomatically are not strictly identical, but the fluctuations around its asymptotic value γ M L 1 is quite negligible. For signal subspace eigenvalues, these fluctuations are even less significant, which may be demonstrated by derivations similar to those above.
While it would be more satisfying ifλ R k was found to be strictly equal to the ML versionγ M L k , this is not the case, as observed by other researchers in the field [12]. But practically speaking, it is not significant that non G-consistent estimatesγ M L k are replaced in (29) by (slightly) different non G-consistent estimatesλ R k . The main point is that the reconstruction in (29) using the G-consistent estimates of eigenvalues and eigenvectors (15) should be at best as efficient as the conventional accurate MLE DOA estimate.
Therefore, the conducted analysis has demonstrated how and why the G-consistent individual estimates of eigenvalues and eigenvectors help to improve MUSIC threshold performance, but still lead to the traditional ML performance. This property is proven rigorously for direct G-consistent estimation of 1 M tr (R −1 (Ω m )R 0 ), and in first order (in 1 T ) approximation for the reconstruction approach in (29). Let us now support these theoretical findings by empirical results.

Numerical Evaluation
Here, we seek by simulation to demonstrate, using direct exhaustive search for the global maximum of the likelihood function and its reconstructed G-modification, that their DOA estimation performance in the threshold region is identical. For this analysis, we have selected the same scenario as in [1] Figure 1. Sample MUSIC breakdown for closely spaced sources. Note the SNR gap between reliable detection using ITC and reliable estimation using MUSIC. Fig. 1 plots the probability of underestimating the correct number of sources as a function of SNR for the AIC, MDL, and MAP information theoretic criteria (ITC). The same figure shows the sample probability of conventional MUSIC breakdown, which illustrates the well-documented fact that MUSIC "breaks" at SNR values significantly larger than the actual MLE threshold [8]. In this paper, we examine the performance at ε ITC = 30dB, where Fig. 1 shows the reliable detection of the correct number of sources m = 2, but the ML DOA estimate itself is questionable.
It can be separately confirmed, following the methodology in [10], that for this scenario, the eigenvalue splitting condition As4, required for G-consistency of the eigenvalue estimation, are met for the 30dB case in this scenario. The fact that this condition is met here comes with no surprise. Indeed, the reliable detection of the correct number of sources m = 2 at this SNR already implies that the cluster of noise subspace sample eigenvalues (λ 1 toλ 6 ) is reliably separated from the cluster of the smallest signal eigenvalueλ 7 . Since λ 8 ≫ λ 7 for our scenario with closely spaced sources, it is really the separation between the λ 7 cluster and the noise subspace one (λ 1 = · · · = λ 6 = 1), which is of concern for evaluation of the As4 condition. We therefore examine the eigendecomposition of the G-reconstructed covariance matrixR G . As predicted, the eigenvectors ofR m andR G m are exactly the same, and only the eigenvalues differ slightly (and then only in the noise subspace). The two subspace eigenvalues are also well separated both from each other and the noise subspace eigenvalues. While inR m the noise subspace is described by a single value (= 1 ). This is demonstrated in Fig. 2, where we introduce sample p.d.f.s for eig 7 (R G m ), eig 8 (R G m ), and the noise eigenvalues eig n (R G m ) = (eig 1 (R G m ), . . . , eig 6 (R G m )), where here we treat as the noise subspace eigenvalue inR G m any one of its six smallest eigenvalues. The p.d.f.s for λ 1 and eig n (R G m ) practically coincide. While Fig. 2 does demonstrate minor variations in the noise subspace eigenvalues inR G m around the single noise subspace eigenvalueλ G 1 inR m , caused by finite T and M values, these variations are very unlikely to result in any noticeable changes in DOA estimation performance whenR G m is used instead ofR m orR in the likelihood function, simply because even larger variations of noise subspace eigenvalues inR around a single ML estimate j=1λ j ) inR m are also not associated with any DOA ML estimation performance degradation. Yet, since all the conditions (As1-As4) of the Theorem 3 [10] are met, we can expect that these minor variations do result in improved individual eigenvalueλ G j and eigensubspaceP G j estimation accuracy. To confirm this, we repeat the analysis conducted in [10], where to assess the quality of the eigensubspace estimates, Mestre introduced an orthogonality factor: Here P j , j = 1, 2, 3 are the eigenvectors/subspaces of the true covariance matrix R 0 andP j are the estimates (either standard ML ones or the G-consistentP G j ones). This orthogonality factor is the ratio of the total power of the estimated subspaceP j that resides in the true subspace P j to the power of its "leakage" into the true orthogonal subspace [I M − P j ]. The estimateP G j (as defined in (30)) is "better" than the MLEP j , defined asP 1 = H 8 if, with high probability, the respective orthogonality factor is greater. In Fig. 3, we introduce sample p.d.f.s for the orthogonality factor O(j) and O(j) G , averaged over the same 500 Monte-Carlo trials for the SNR=30dB case. Similarly to [10], we observe that the G-consistent subspace estimate achieves a much higher orthogonality factor than the traditional ML sample estimator, indicating a better capacity to estimate subspaces, and therefore better accuracy for subspace-based DOA estimation techniques such as MUSIC.
To demonstrate, we compare the results of exhaustive global maximum search for the conventional (normalized) likelihood function (ratio): and its G-asymptotic counterpart GLR[R(Ω m )] constructed usingR G m instead ofR.  This global optimization is performed in two steps. In step 1, we find the global extremum over a fine grid on a half-plane (θ 1 < θ 2 ), and in step 2, we conduct local optimization over the parameters θ 1 < θ 2 as well as the two source powers using the MATLAB fmincom routine. The white noise power is assumed known a priori and set to unity. More details on the optimization routine and MLE results can be found in [1,2,14]. Due to our convention thatθ 1 <θ 2 , the DOA sample distributions illustrated by Fig. 4 overlap, since for scenarios with very closely spaced sources, over the 500 trials, the minimumθ 2 was sometimes less than the maximumθ 1 , even though on each individual trial θ 1 was less than θ 2 . One can see that the distribution for the DOAs derived using the standard LR (38) and the G-asymptotic counterpart are practically indistinguishable, as are any moments of these distributions. Moreover, in addition to the identical statistical performance of the DOA estimates, identical individual estimates were registered in a trial-by-trial basis.
In addition to the illustrated results, the same results were observed at several higher SNRs, and were also the same when the ML covariance matrix estimateR m = ∑ M k=1γ M L m P M L m was used. Finally, in a completely ad-hoc "mix and match" approach, we also assigned the G-consistent eigenvalue estimatesλ G 1 (the G-consistent noise power estimate),λ G 2 andλ G 3 (15) to the maximum likelihood (mutually orthogonal) eigensubspacesP 1 ,P 2 , and P 3 and once again did not record any DOA estimation improvements, showing that the minor variations in noise subspace eigenvalues inR G m with respect toR m was not significant in MLE.

26
Can the Threshold Performance of Maximum Likelihood DOA Estimation be Improved by Tools from Random Matrix Theory?

Concluding Remarks
The theoretical results from RMT and their examination by direct Monte-Carlo simulations has confirmed that for Gaussian sources in Gaussian noise, ML DOA estimation in the so-called "threshold" region is not improved by use of recent results from Random Matrix Theory. More specifically, we demonstrated that the Girko Lemma 3.1 from [6] implies that for the multi-variate (complex) Gaussian data case, the stochastic (unconditional) likelihood function is both a T -consistent and G-consistent estimate of the function 1/M tr [R −1 (Ω m )R 0 ], where R 0 is represented by a set of T i.i.d. training vectors x j .
We demonstrated that in full accordance with this Lemma, an alternative reconstruction of the log-likelihood function using individually G-consistent estimates of eigenvalues and eigenvectors does not improve DOA estimation threshold performance, despite the fact that these individual G-consistent estimates are more accurate than their corresponding ML estimates. Thus, the demonstrated in [9][10][11] improvement in MUSIC performance delivered by the G-consistent estimates of the individual noise eigensubspace is not at odds with the lack of improvement in MLE performance.
While applicable to arbitrary parametric description of the covariance matrix R 0 , these results should not be extended beyond the multivariate (complex) i.i.d. Gaussian case without detailed examination. Finally, it is important to stress that our results are confined to the currently (in [10]) unique set of G-consistent eigenvalue and eigenvector estimates, and therefore the broader question on whether the threshold performance of ML DOA estimation can be improved by some other technique remains open.