Tensor Multivariate Trace Inequalities and Their Applications

In linear algebra, the trace of a square matrix is defined as the sum of elements on the main diagonal. The trace of a matrix is the sum of its eigenvalues (counted with multiplicities), and it is invariant under the change of basis. This characterization can be used to define the trace of a tensor in general. Trace inequalities are mathematical relations between different multivariate trace functionals involving linear operators. These relations are straightforward equalities if the involved linear operators commute, however, they can be difficult to prove when the non-commuting linear operators are involved. Given two Hermitian tensors H1 and H2 that do not commute. Does there exist a method to transform one of the two tensors such that they commute without completely destroying the structure of the original tensor? The spectral pinching method is a tool to resolve this problem. In this work, we will apply such spectral pinching method to prove several trace inequalities that extend the Araki–Lieb–Thirring (ALT) inequality, Golden–Thompson(GT) inequality and logarithmic trace inequality to arbitrary many tensors. Our approaches rely on complex interpolation theory as well as asymptotic spectral pinching, providing a transparent mechanism to treat generic tensor multivariate trace inequalities. As an example application of our tensor extension of the Golden–Thompson inequality, we give the tail bound for the independent sum of tensors. Such bound will play a fundamental role in high-dimensional probability and statistical data analysis.


Introduction
Trace inequalities are mathematical relations between different multivariate trace functionals involving linear operators. These relations are straightforward equalities if the involved linear operators commute, however, they can be difficult to prove when the non-commuting linear operators are involved [4].
One of the most important trace inequalities is the famous Golden-Thompson inequality [8]. For any two Hermitian matrices H 1 and H 2 , we have It is easy to see that the Eq. (1) becomes an identity if two Hermitian matrices H 1 and H 2 are commute. The inequality in Eq. (1) has been generalized to several situations. For example, it has been demonstrated that it remains valid by replacing the trace with any unitarily invariant norm [14,24]. The Golden-Thompson inequality has been applied to many various fields ranging from quantum information processing [16,17], statistical physics [26,29], and random matrix theory [1,27].
The Golden-Thompson inequality can be seens as a limiting case of the more general Araki-Lieb-Thirring (ALT) inequality [3,18]. For any two positive semi-definite matrices A 1 and A 2 with r ∈ (0, 1] and q > 0, ALT states that The Golden-Thompson inequality for Schatten p-norm is obtained by the Lie-Trotter product formula by taking limit r → 0. The ALT inequality has also been expanded to various directions [2,13,28].
The following theorem is about logarithmic trace inequality which can be used to bound quantum information divergence [2,9]. For any two positive semi-definite matrices A 1 and A 2 with r ∈ (0, 1] and p > 0, logarithmic trace inequality for matrix is The paper is organized as follows. Preliminaries of tensors are given in Section 2. In Section 3, the method of pinching and complex interpolation theory will be introduced. Three useful matrix-based trace inequalities are extended to multivariate tensors in Section 4. The new Golden-Thompson inequality is applied to provide tail bounds for sums of random tensors in Section 5. Finally, the conclusions are given in Section 6.
In analogy to matrix analysis, we define some typical tensors and elementary tensor-operations as follows.
Definition 1. A tensor whose entries are all zero is called a zero tensor, denoted by O.

Tensor Multivariate Trace Inequalities and Their Applications
In order to define the Hermitian tensor, the conjugate transpose operation (or Hermitian adjoint) of a tensor is specified as follows.

Definition 4. Given a tensor
then A is a unitary tensor.
then X is the inverse of A. We usually write X def = A −1 thereby.
We also list other necessary tensor operations here. The trace of a tensor is equivalent to the summation of all diagonal entries such that The inner product of two tensors A, B ∈ C I1×···×I M ×J1×···×J N is given by According to Eq. (11), the Frobenius norm of a tensor A is defined by Definition 6. Given a square tensor A ∈ C I1×···×I M ×I1×···×I M , the tensor exponential of the tensor A is defined as where A 0 is defined as the identity tensor I ∈ C I1×···×I M ×I1×···×I M and Given a tensor B, the tensor A is said to be a tensor logarithm of B if e A = B Following definitions are about the Kronecker product and the sum of two tensors.
We require the following two lemmas about Kroneecker product which will be used for later proof in Theorem 1.
Lemma 1. Given tensors A 1 and A 2 which act on spaces S 1 and S 2 , respectively, we have following identities : and where I S1 and I S2 are identity tensors acts on spaces S 1 and S 2 , respectively.

Proof:
We prove Eq. (16) first. Suppose the tensor A 1 ∈ C I1×···×I M ×I1×···×I M , then its entries will be (a i1,··· ,i M ,j1,··· ,j M ). By definition of the Kronecker product, we have Next, we will verify the relation provided by Eq. (17). Because we have where the equality = 1 comes from Theorems 2 and 3 in [19], and the last equality is provided by Definition 8. 2 Lemma 2. Given positive tensors A 1 and A 2 which act on spaces S 1 and S 2 , respectively, we have following identity : where I S1 and I S2 are identity tensors acts on spaces S 1 and S 2 , respectively.

Proof:
From the relation (17) and set B 1 = log(A 1 ), B 2 = log(A 2 ) , we have By taking log at both sides, we have desired result

Tools for Hermitian Tensors
In this section, we will introduce two main techniques used to prove multivariate trace inequalities for tensors. Spectrum pinching method is discussed in Section 3.1, and complex interpolation theory is presented in Section 3.2.

Pinching Map
The purpose for studying the pinching method arises from the following problem: Given two Hermitian tensors H 1 and H 2 that do not commute. Does there exist a method to transform one of the two tensors such that they commute without completely destroying the structure of the original tensor? The spectral pinching method is a tool to resolve this problem. Before discussing this method in detail we have to introduce the pinching map.
Given a Hermitian tensor H ∈ C I1×···×I M ×I1×···×I M , we have spectral decomposition as [20]: where λ ∈ sp(H) ∈ R and U λ ∈ C I1×···×I M ×I1×···×I M are mutually orthogonal tensors. The pinching map with respect to H is defined as where X ∈ C I1×···×I M ×I1×···×I M is a Hermitian tensor. The pinching map possesses various nice properties that will be discussed at this section. For example, P H (X ) always commutes with H for any nonnegative tensor X . Two lemmas are introduced first which will be used to prove several useful properties about pinching maps.
. The number of distinct eigenvalues of A ⊗m , where ⊗ is the Kronecker product defined in Definition 7, grows polynomially with m.
Proof: Let us use the symbol sp(A ⊗m ) to represent the spectrum space, i.e., the space of eigenvalues. Because the number of distinct eigenvalues of A ⊗m , denoted as |sp(A ⊗m )|, is bounded by the number of different types of sequences of N (M − 1) N −1 def = e A symbols of length m from methods of types [5], then we have where O(poly(m)) represents a function that grows with m polynomially. When m = 1, the number of |sp(A ⊗ )| is upper bounded by e A , which is the number of eigenvalues of A, see Theorem 1.1 in [22]. 2 For any probability measure µ be a probability measure on a measurable space (X, Σ) and consider a sequence of nonnegative tensors {A x } x∈X , we have following triangle inequality: due to the convexity of p-norm for p ≥ 1. Quasi-norms with p ∈ (0, 1) are no longer convex. However, we demonstrate in Lemma 4 that these quasi-norms still satisfy an asymptotic convexity property for Kronecker products of tensors in the sense of allowing an extra term associated with the number of tensors involving the Kronecker product.
Lemma 4. Let p ∈ (0, 1), µ be a probability measure on a measurable space (X, Σ), and consider a sequence of nonnegative Proof: Let H be the Hilbert space where the tensor A x acts on. For any x ∈ X, consider the CP decomposition A x = kx λ kx a 1,kx ⊗ a 2,kx ⊗ · · · ⊗ a M,kx . By introducing an isometric space H to H, we define the vector Note that the projectors ( kx λ kx v 1,kx ⊗ v 2,kx ⊗ · · · ⊗ v M,kx ) ⊗m lie in the symmetric subspace of (H ⊗ H ) ⊗m whose dimension grows with poly(m) from Lemma 3. Then, we have From Caratheodory theorem (see Theorem 18 in [7]), there exists a discrete probability measure P r(x), where x ∈ X d and X d ∈ X is the discrete set with the cardinality as poly(m) such that Therefore, we can get When p ∈ (0, 1), the Schatten p-norm satisfies following triangle inequality for tensors (see [11]) and from Eq. (29), we obtain Since the map s → s 1 p is convex for p ∈ (0, 1), we have where |X d | = poly(m) is applied at the last step.
for all p > 0.
We need the following definition about a family of probability distribution to represent a pinching map with integration.
Following lemma will provide an integral representation of the pinching map.

Lemma 5. [Integral Representation of Pinching Map]
Let H, X ∈ C I1×···×I M ×I1×···×I M be Hermitian tensors with same dimensions and µ ∆ H is a probability measure with properties given in Definition 9. The term ∆ H is defined as ∆ H def = min{|λ j − λ k | : λ j = λ k } where λ j , λ k are two distinct eigenvalues in the spectral decomposition of the tensor H given by Eq. (23), then we have following integral representation for a pinching map where ι is √ −1.

Proof:
Because the spectral decomposition of tensor H is where λ ∈ sp(H) ∈ R and U λ are mutually orthogonal tensors. For any s ∈ R, we then have and If we integrate both sides of Eq. (35) with respect to measure µ ∆ H , we obtain By applying the properties in Definition 9 and the definition of the spectral gap ∆ H , we finally obtain which asserts this Lemma.
Proof: Because we have Proof: From linearity and cyclic properties of the trace, Lemma 5 and the fact that the tensor e −ιsH commutes with the tensor H for all s ∈ R, then we have where |sp(H)| is the cardinality for the eigenvalues in the space sp(H).

Proof:
We first define the tensor V k as following: Then, the pinching map P H (X ) can be expressed as where we use following fact in the equality = 1 : For the inequality ≥ Lo , we use following relations and where the zero tensor O and the identiy tensor I both are with the same dimensions as X (or H).
The equality = 1 comes from Lemmas 1 and 2. The inequality ≤ 2 follows from pinching inequality (Lemma 8), the monotone of log and Tr exp ( ) functions, and the number of eigenvalues of A ⊗m 2 growing polynomially with m (Lemma 3). The equality = 3 utilizes the commutativity property for tensors P A ⊗m 2 (A ⊗m 1 ) and A ⊗m 2 based on Lemma 6. Finally, the equality = 4 applies trace properties from Lemmas 1, 2, and Lemma 7. If m → ∞, the result of this theorem is established. 2 Theorem 2 (Araki-Lieb-Thirring for tensors). Given two positive semi-definite tensors A 1 ∈ C I1×···×I M ×I1×···×I M and A 2 ∈ C I1×···×I M ×I1×···×I M , and q > 0, then with equality if and only if A 1 M A 2 = A 2 M A 1 . This inequality holds in the opposite direction for r ≥ 1.

Proof:
Since we have The equality = 1 comes from Lemmas 1 and 2. The inequality ≤ 2 follows from pinching inequality (Lemma 8), the monotone of X → Tr(X α ) function for α ≥ 0, and the number of eigenvalues of A ⊗m For case r ≥ 1, if we perform following replacements A r 1 ← A 1 , A r 2 ← A 2 , q r ← q, and 1 r ← r, the inequality in this theorem will be reversed.

Complex Interpolation Theory
In this section, we will mention those definitions and theorems about complex interpolation theory which will be used to prove multivariate tensor trace inequalities in Sec. 4. Complex interpolation theory enable us to control the behaviors of the complex function defined on the strip S def = {z ∈ C : 0 ≤ (z) ≤ 1} by its boundary values, (z) = 0 and (z) = 1. We define a family of probability measure on R as ρ θ (s) def = sin(πθ) 2θ(cosh(πs) + cos(πθ)) for θ ∈ (0, 1).
Moreover, we have following limiting behaviors for ρ θ : and where δ is the Dirac δ-distribution. We will introduce Stein-Hirschman theorem [10,25] about complex interpolation theory.
Let F be a map from S to bounded linear operators on a separable Hilbert space that is holomorphic in the interior of S and continuous on the boundary. If z → F (z) p (z) is uniformly bounded on S, we have

Multivariate Tensor Trace Inequalities
In order to extend Theorems 1 and 2 involving two tensors to multiple tensors, we require the following lemma about Lie product formula for tensors. Proof: We will prove the case for m = 2, and the general value of m can be obtained by mathematical induction. Let L 1 , L 2 be bounded tensors act on some Hilbert space. Define C def = exp((L 1 + L 2 )/n), and D def = exp(L 1 /n) M exp(L 2 /n). Note we have following estimates for the norm of tensors C, D: From the Cauchy-Product formula, the tensor D can be expressed as: then we can bound the norm of C − D as For the difference between the higher power of C and D, we can bound them as where the inequality ≤ 1 uses the following fact based on Eq. (59). By combining with Eq. (61), we have the following bound Then this lemma is proved when n goes to infity.

Multivariate Araki-Lieb-Thirring Inequality
In this section, we will provide a theorem for multivariate Araki-Lieb-Thirring (ALT) inequality for tensors. For θ = 1, the both sides of Eq. (65) are equal to log | n k=1 A k | p . We will prove the cases for θ ∈ (0, 1). We prove the result for strictly positive definite tensors and note that the generalization to positive semi-definite tensors follows by continuity. We define the function F (z) def = n k=1 A z k = n k=1 exp(z log A k ) which satisfies the conditions of Theorem 3. By selecting p 0 = ∞, p 1 = p, and p θ = p θ in Theorem 3, one can obtain and log F (ιs) since tensors A ιs k are unitary. We also have and this theorem is proved by putting Eqs. (66), (67), (68) into Eq. (57).

Multivariate Golden-Thompson Inequality
In this section, we will provide a theorem for multivariate Golden-Thompson (GT) inequality for tensors. (69) Proof: From Theorem 9 and Lie product formula given by Lemma 9, this theorem is proved by taking θ → 0 in Eq. (65). 2 The multivariate Golden-Thompson inequality provided by Theorem 5 is only true for Hermitian tensors. The following theorem generalizes Theorem 5 to general tensors. Theorem 6. Let p ≥ 1, probability distribution ρ 0 defined by (55), n ∈ N, and consider a finite sequence (A) n k=1 of tensors. Then, we have where (A k ) is the real part of the tensor A k defined as (A k ) Proof: We also define the imaginary part of the tensor A k as (A k ) def = 1 2ι (A k − A H k ) and note that the both (A k ) and (A k ) are Hermitian tensors. We define the function F (z) def = n k=1 exp(z (A k ) + ιθ (A k ) which satisfies the conditions of Theorem 3. By selecting p 0 = ∞, p 1 = p, and p θ = p θ in Theorem 3, one can obtain where we used that log F (ιs) ∞ = 0 since F (ιs) is unitary in the inequality step. By dividing θ at both sides of Eq. (71) and taking θ → 0, the theorem is proved by applying Lie product formula given by Lemma 9 again. 2

Multivariate Logarithmic Trace Inequality for Tensors
In this section, we will apply Theorem 5 to prove multivariate logarithmic trace inequality. We have to define relative entropy between two tensors first.
Definition 10. Given two positive definite tensors A ∈ C I1×···×I M ×I1×···×I M and tensor B ∈ C I1×···×I M ×I1×···×I M , where the tensor A has the trace equal to one. The relative entropy between tensors A and B is defined as Based on this relative entropy definition, we have the following lemma about variational expression of relative entropy.
By taking derivative with respect to λ for Eq. (76), we have this shows that the minimizer for Eq. (75) is a strictly positive tensorsÃ with TrÃ = 1. For any Hermitian tensor Y with TrỸ = 0, we have This indicates that H + log B − logÃ is proportional to the identity tensor. Then, we will havẽ A = e H+log B Tre H+log B and g(Ã) = log Tre H+log B , For any Hermitian tensor Y, we have because TrA = 1 and dTre log A+tY dt | t=0 = TrAY. Therefore, the tensorH is the maximizer of function f and Since for any any Hermitian tensor H can be expressed as H = log X for some positive semi-definite tensor, we proved Eq. (73). Now, we are ready to prove Eq. (74). From log x ≤ x − 1 for x ≥ 0, we have log Tre log B+log X ≤ Tre log B+log X − 1. Hence, we have sup X (TrA log X − log Tre log B+log X ) ≥ sup X (TrA log X + 1 − Tre log B+log X ). (83) Because TrA log X − log Tre log B+log X is invariant under the scaling transform from X to γX for γ ∈ R + , we can assume that Tre log B+log X = 1. Then, we have sup X (TrA log X − log Tre log B+log X ) = sup X TrA log X − log Tre log B+log X : From both Eqs (83) and (84), we prove Eq. (74).
2 We are ready to present multivariate logarithmic trace inequality for tensors by the following theorem.
Theorem 7. Let 0 < q ≤ 1, probability distribution ρ 0 defined by (55), n ∈ N, and consider a finite sequence (A) n k=1 of positive semi-definite tensors. Then, we have n k=1 which the equality will be valid when q → 0.
Proof: Because the inequality given by Eq. (85) is invariant under multiplication of the tensors A 1 , A 2 , · · · , A n with positive numbers a 1 , a 2 , · · · , a n , we can add constraints on the norms of tensors without loss of generality. We assume that the TrA 1 = 1.
From the relative entropy in Definition 10, we have n k=1 where we apply Lemma 10. From Theorem 6 and set H k = log A q k , we have If we set the tensor X as the tensor X becomes a positive semi-definite tensor. Substituting Eq. (89) into Eq. (88), this theorem is proved for 0 < q ≤ 1.
For q → 0, we wish to prove the equality at Eq. (85). Because log X ≥ Lo I − X −1 for any positive tensor X , we have and we can assume that h q (s) ≥ 0 since we can scale each tensor A k by a positive number for k ∈ [n]. By Fatou's lemma, we have We also have h 0 (s) = 0 and By Eqs (90) and (91), we have the equality at Eq (85) as q → 0. 2

Applications: Random Tensors
This section will apply multivariate Golden-Thompson inequality from Theorem 5 to form the tail bound for independent sum of random tensors.
Consider a random Hermitian tensor X ∈ C I1×···×I M ×I1×···×I M , i.e., each entry in this tensor is an independent random variable with x i1,··· ,i M ,j1,··· ,j M = x j1,··· ,j M ,i1,··· ,i M . We assume that the random tensor X has moments of all order n. We can construct tensor extensions of the moment generating function (MGF), and the cumulant generating function (CGF): where t ∈ R. The tensor MGF and CGF can be expressed as power series expansions: where the coefficients E(X n ) are called tensor moments, and Φ n are named as tensor cumulants. The tensor cumpulant Φ n has a formal expression as a noncommutative polynomial in the tensor moments up to order n. For example, the first cumulant is the mean and the second cumulant is the variance:

Laplace Transform Method for Tensors
We will apply Laplace transform bound to bound the maximum eigenvalue of a random Hermitian tensor by following lemma. Lemma ?? help us to control tail probabilities for the maximum eigenvalue of a random tensor by producing a bound for the trace of the tensor MGF defined in Eq. (93).
The first equality uses the homogeneity of the maximum eigenvalue map, the second equality comes from the monotonicity of the scalar exponential function, and the last relation is Markov's inequality. Because we have e λmax(tY) = λ max (e tY ) ≤ (2M − 1) N Tre tY , where the first equality used the spectral mapping theorem, and the inequality holds because the exponential of an Hermitian tensor is positive definite and the maximum eigenvalue of a positive definite tensor is dominated by the trace [21]. From Eqs (97) and (98), this lemma is established. 2

Tail Bounds for Independent Sum
This section contains abstract tail bounds for the sum of independent random tensors. This general inequality can serve as the progenitor of other random tensors majorization inequality.
Theorem 8. Consider n independent random Hermitian tensors X k ∈ C I1×···×I M ×I1×···×I M for k ∈ [n], for all ζ ∈ R, we have By taking the expectation of both sides and applying the indepedence property for all random tensors X k , we obtain ) ρ 0 (s)ds.

Conclusions
In this work, we extend Araki-Lieb-Thirring (ALT) inequality, Golden-Thompson(GT) inequality and logarithmic trace inequality to arbitrary many tensors. Our proofs utilize complex interpolation theory and asymptotic spectral pinching, providing a powerful mechanism to deal with multivariate trace inequalities for tensors. We then apply tensor Golden-Thompson inequality to provide the tail bound for the independent sum of tensors and this bound will play a crucial role in high-dimensional probability and statistical data analysis. We believe our work can be applied to various science and engineering problems involving multivariate variables [23,6].