TENSOR MULTIVARIATE TRACE INEQUALITIES AND THEIR APPLICATIONS

. We prove several trace inequalities that extend the Araki–Lieb–Thirring (ALT) inequality, Golden–Thompson(GT) inequality and logarithmic trace inequality to arbitrary many tensors. Our approaches rely on complex interpolation theory as well as asymptotic spectral pinching, providing a transparent mechanism to treat generic tensor multivariate trace inequalities. As an example application of our tensor extension of the Golden–Thompson inequality, we give the tail bound for the independent sum of tensors. Such bound will play a fundamental role in high-dimensional probability and statistical data analysis.

1. Introduction. Trace inequalities are mathematical relations between different multivariate trace functionals involving linear operators. These relations are straightforward equalities if the involved linear operators commute, however, they can be difficult to prove when the non-commuting linear operators are involved [4].
One of the most important trace inequalities is the famous Golden-Thompson inequality [7]. For any two Hermitian matrices H 1 and H 2 , we have Tr exp(H 1 + H 2 ) ≤ Tr exp(H 1 ) exp(H 2 ). (1.1) It is easy to see that the Eq. (1.1) becomes an identity if two Hermitian matrices H 1 and H 2 are commute. The inequality in Eq. (1.1) has been generalized to several situations. For example, it has been demonstrated that it remains valid by replacing the trace with any unitarily invariant norm [13,22]. The Golden-Thompson inequality has been applied to many various fields ranging from quantum information processing [15,16], statistical physics [24,27], and random matrix theory [1,25].
The Golden-Thompson inequality can be seens as a limiting case of the more general Araki-Lieb-Thirring (ALT) inequality [3,17]. For any two positive semidefinite matrices A 1 and A 2 with r ∈ (0, 1] and q > 0, ALT states that The Golden-Thompson inequality for Schatten p-norm is obtained by the Lie-Trotter product formula by taking limit r → 0. The ALT inequality has also been expanded to various directions [2,12,26].
The following theorem is about logarithmic trace inequality which can be used to bound quantum information divergence [2,8]. For any two positive semi-definite matrices A 1 and A 2 with r ∈ (0, 1] and p > 0, logarithmic trace inequality for matrix is The paper is organized as follows. Preliminaries of tensors are given in Section 2. In Section 3, the method of pinching and complex interpolation theory will be introduced. Three useful matrix-based trace inequalities are extended to multivariate tensors in Section 4. The new Golden-Thompson inequality is applied to provide tail bounds for sums of random tensors in Section 5. Finally, the conclusions are given in Section section 6. 2. Tensors Preliminaries. Essential terminologies regarding tensors will be introduced in this section. Throughout this paper, we denote scalars by lower-case letters (e.g., d, e, f , . . .), vectors by boldfaced lower-case letters (e.g., d, e, f , . . .), matrices by boldfaced capitalized letters (e.g., D, E, F, . . .), and tensors by calligraphic letters (e.g., D, E, F , . . .), respectively. Tensors are referred to as multiarrays of values which can be deemed high-dimensional generalizations from vectors and matrices. Given a positive integer N , let [N ] def = 1, 2, · · · , N . An order-N tensor (or N -th order tensor ) is represented by A def = (a i1,i2,··· ,iN ), where 1 ≤ i j ≤ I j for j ∈ [N ], is a multidimensional array containing I 1 × I 2 × · · · × I N entries. Let C I1×···×IN and R I1×···×IN be the sets of order-N I 1 × · · · × I N tensors over the complex field C and the real field R, respectively. For example, A ∈ C I1×···×IN is an order-N multiarray, where the first, second, ..., and N -th orders have I 1 , I 2 , ..., and I N entries, respectively. Thus, each entry of A can be represented by a i1,··· ,iN . For example, when N = 4, A ∈ C I1×I2×I3×I4 is a fourth-order tensor containing entries a i1,i2,i3,i4 , where Without loss of generality, one can partition the dimensions of a tensor into two groups, say M and N dimensions, separately. Therefore, for two order-(M +N ) tensors: [14], the tensor addition A + B ∈ C I1×···×IM ×J1×···×JN is given by On the other hand, for tensors A = (a i1,··· ,iM ,j1,··· ,jN ) ∈ C I1×···×IM ×J1×···×JN and B = (b j1,··· ,jN ,k1,··· ,kL ) ∈ C J1×···×JN ×K1×···×KL , according to [14], the Einstein product (or simply referred to as tensor product in this work) A ⋆ N B ∈ C I1×···×IM ×K1×···×KL is given by This tensor product will be reduced to the standard matrix multiplication as L = M = N = 1. Other simplified situations can also be extended as tensor-vector product (M > 1, N = 1, and L = 0) and tensor-matrix product (M > 1 and N = L = 1). In analogy to matrix analysis, we define some typical tensors and elementary tensoroperations as follows.
In order to define the Hermitian tensor, the conjugate transpose operation (or Hermitian adjoint ) of a tensor is specified as follows.
where the overline notion indicates the complex conjugate of the complex number a i1,··· ,iM ,j1,··· ,jN . If a tensor A satisfying A H = A, then A is a Hermitian tensor.
We also list other necessary tensor operations here. The trace of a tensor is equivalent to the summation of all diagonal entries such that The inner product of two tensors A, B ∈ C I1×···×IM ×J1×···×JN is given by According to Eq. (2.8), the Frobenius norm of a tensor A is defined by Definition 2.6. Given a square tensor A ∈ C I1×···×IM ×I1×···×IM , the tensor exponential of the tensor A is defined as where A 0 is defined as the identity tensor I ∈ C I1×···×IM ×I1×···×IM and Given a tensor B, the tensor A is said to be a tensor logarithm of B if e A = B Following definitions are about the Kronecker product and the sum of two tensors.  (2.12) where I I1×···×IM ∈ C I1×···×IM ×I1×···×IM and I J1×···×JP ∈ C J1×···×JP ×J1×···×JP are identity tensors.
We require the following two lemmas about Kroneecker product which will be used for later proof in Theorem 3.8.
Lemma 2.9. Given tensors A 1 and A 2 which act on spaces S 1 and S 2 , respectively, we have following identities : (2.14) where I S1 and I S2 are identity tensors acts on spaces S 1 and S 2 , respectively.
Lemma 2.10. Given positive tensors A 1 and A 2 which act on spaces S 1 and S 2 , respectively, we have following identity : (2.17) where I S1 and I S2 are identity tensors acts on spaces S 1 and S 2 , respectively.

Proof:
From the relation (2.14) and set B 1 = log(A 1 ), B 2 = log(A 2 ) , we have By taking log at both sides, we have desired result log(A 1 ⊗ A 2 ) = (log A 1 ) ⊗ I S2 + I S1 ⊗ (log A 2 ). (2.19) 3. Tools for Hermitian Tensors. In this section, we will introduce two main techniques used to prove multivariate trace inequalities for tensors. Spectrum pinching method is discussed in Section 3.1, and complex interpolation theory is presented in Section 3.2.
3.1. Pinching Map. The purpose for studying the pinching method arises from the following problem: Given two Hermitian tensors H 1 and H 2 that do not commute. Does there exist a method to transform one of the two tensors such that they commute without completely destroying the structure of the original tensor? The spectral pinching method is a tool to resolve this problem. Before discussing this method in detail we have to introduce the pinching map.
Given a Hermitian tensor H ∈ C I1×···×IM ×I1×···×IM , we have spectral decomposition as [19]: where λ ∈ sp(H) ∈ R and U λ ∈ C I1×···×IM ×I1×···×IM are mutually orthogonal tensors. The pinching map with respect to H is defined as where X ∈ C I1×···×IM ×I1×···×IM is a Hermitian tensor. The pinching map possesses various nice properties that will be discussed at this section. For example, P H (X ) always commutes with H for any nonnegative tensor X . Two lemmas are introduced first which will be used to prove several useful properties about pinching maps. = e A symbols of length m from methods of types [5], then we have where O(poly(m)) represents a function that grows with m polynomially. When m = 1, the number of |sp(A ⊗ )| is upper bounded by e A , which is the number of eigenvalues of A, see Theorem 1.1 in [21].
For any probability measure µ be a probability measure on a measurable space (X, Σ) and consider a sequence of nonnegative tensors {A x } x∈X , we have following triangle inequality: due to the convexity of p-norm for p ≥ 1. Quasi-norms with p ∈ (0, 1) are no longer convex. However, we demonstrate in Lemma 3.2 that these quasi-norms still satisfy an asymptotic convexity property for Kronecker products of tensors in the sense of allowing an extra term associated with the number of tensors involving the Kronecker product.
Lemma 3.2. Let p ∈ (0, 1), µ be a probability measure on a measurable space (X, Σ), and consider a sequence of nonnegative tensors {A x } x∈X with A x ∈ C I1×···×IM having Canonical Polyadic (CP) decomposition, i.e., each A x can be expressed as Then we have Proof: Let H be the Hilbert space where the tensor A x acts on. For any x ∈ X, consider the CP decomposition A x = kx λ kx a 1,kx ⊗ a 2,kx ⊗ · · ·⊗ a M,kx . By introducing [11]. Note that the projectors ( lie in the symmetric subspace of (H ⊗ H ′ ) ⊗m whose dimension grows with poly(m) from Lemma 3.1. Then, we have From Caratheodory theorem (see Theorem 18 in [6]), there exists a discrete probability measure P r(x), where x ∈ X d and X d ∈ X is the discrete set with the cardinality as poly(m) such that Therefore, we can get When p ∈ (0, 1), the Schatten p-norm satisfies following triangle inequality for tensors (see [10]) and from Eq. (3.7), we obtain Since the map s → s 1 p is convex for p ∈ (0, 1), we have where |X d | = poly(m) is applied at the last step. From Eq. (3.4) and from Lemma 3.2, we also have We need the following definition about a family of probability distribution to represent a pinching map with integration.
We define a family of probability distribution on R, named as µ ∆ (x), which satisfies following properties: Following lemma will provide an integral representation of the pinching map.

Lemma 3.4. [Integral Representation of Pinching Map]
Let H, X ∈ C I1×···×IM ×I1×···×IM be Hermitian tensors with same dimensions and µ ∆H is a probability measure with properties given in Definition 3.3. The term ∆ H is defined as ∆ H def = min{|λ j − λ k | : λ j = λ k } where λ j , λ k are two distinct eigenvalues in the spectral decomposition of the tensor H given by Eq. (3.1), then we have following integral representation for a pinching map

Proof:
Because the spectral decomposition of tensor H is where λ ∈ sp(H) ∈ R and U λ are mutually orthogonal tensors. For any s ∈ R, we then have and If we integrate both sides of Eq. (3.13) with respect to measure µ ∆H , we obtain  (3.17) which asserts this Lemma.
Following lemmas are introduced for those nice properties about pinching maps.

Proof:
Because we have where |sp(H)| is the cardinality for the eigenvalues in the space sp(H).

Proof:
We first define the tensor V k as following: Then, the pinching map P H (X ) can be expressed as where we use following fact in the equality = 1 : For the inequality ≥ Lo , we use following relations  The equality = 1 comes from Lemmas 2.9 and 2.10. The inequality ≤ 2 follows from pinching inequality (Lemma 3.7), the monotone of log and Tr exp ( ) functions, and the number of eigenvalues of A ⊗m 2 growing polynomially with m (Lemma 3.1). The equality = 3 utilizes the commutativity property for tensors P A ⊗m 2 (A ⊗m 1 ) and A ⊗m 2 based on Lemma 3.5. Finally, the equality = 4 applies trace properties from Lemmas 2.9, 2.10, and Lemma 3.6. If m → ∞, the result of this theorem is established.

Proof:
Since we have log Tr A r 2 The equality = 1 comes from Lemmas 2.9 and 2.10. The inequality ≤ 2 follows from pinching inequality (Lemma 3.7), the monotone of X → Tr(X α ) function for α ≥ 0, and the number of eigenvalues of A ⊗m 1 growing polynomially with m (Lemma 3.1). The equality = 3 utilizes the commutativity property for tensors P A ⊗m For case r ≥ 1, if we perform following replacements A r 1 ← A 1 , A r 2 ← A 2 , q r ← q, and 1 r ← r, the inequality in this theorem will be reversed.

Complex Interpolation Theory.
In this section, we will mention those definitions and theorems about complex interpolation theory which will be used to prove multivariate tensor trace inequalities in Sec. 4. Complex interpolation theory enable us to control the behaviors of the complex function defined on the strip S where δ is the Dirac δ-distribution.

map from S to bounded linear operators on a separable Hilbert space that is holomorphic in the interior of S and continuous on the boundary. If
4. Multivariate Tensor Trace Inequalities. In order to extend Theorems 3.8 and 3.9 involving two tensors to multiple tensors, we require the following lemma about Lie product formula for tensors.

Proof:
We will prove the case for m = 2, and the general value of m can be obtained by mathematical induction. Let L 1 , L 2 be bounded tensors act on some Hilbert space. Define C def = exp((L 1 + L 2 )/n), and D def = exp(L 1 /n) ⋆ M exp(L 2 /n). Note we have following estimates for the norm of tensors C, D: From the Cauchy-Product formula, the tensor D can be expressed as: then we can bound the norm of C − D as For the difference between the higher power of C and D, we can bound them as where the inequality ≤ 1 uses the following fact Then this lemma is proved when n goes to infity.

Multivariate Araki-Lieb-Thirring Inequality.
In this section, we will provide a theorem for multivariate Araki-Lieb-Thirring (ALT) inequality for tensors.

Proof:
For θ = 1, the both sides of Eq. (4.8) are equal to log | n k=1 A k | p . We will prove the cases for θ ∈ (0, 1). We prove the result for strictly positive definite tensors and note that the generalization to positive semi-definite tensors follows by continuity. We define the function F (z)

Multivariate Golden-Thompson Inequality.
In this section, we will provide a theorem for multivariate Golden-Thompson (GT) inequality for tensors.  where ℜ(A k ) is the real part of the tensor A k defined as ℜ(A k ) def = 1 2 (A k + A H k ). Proof: We also define the imaginary part of the tensor A k as ℑ(A k ) def = 1 2ι (A k − A H k ) and note that the both ℜ(A k ) and ℑ(A k ) are Hermitian tensors. We define the function F (z) def = n k=1 exp(zℜ(A k ) + ιθℑ(A k ) which satisfies the conditions of Theorem 3.10. By selecting p 0 = ∞, p 1 = p, and p θ = p θ in Theorem 3.10, one can obtain where we used that log F (ιs) ∞ = 0 since F (ιs) is unitary in the inequality step. By dividing θ at both sides of Eq. (4.14) and taking θ → 0, the theorem is proved by applying Lie product formula given by Lemma 4.1 again.

Multivariate Logarithmic Trace Inequality for Tensors.
In this section, we will apply Theorem 4.3 to prove multivariate logarithmic trace inequality. We have to define relative entropy between two tensors first.

Proof:
For any Hermitian H tensor with dimensions H ∈ C I1×···×IM ×I1×···×IM , we first show that     We are ready to present multivariate logarithmic trace inequality for tensors by the following theorem.
which the equality will be valid when q → 0.
Proof: Because the inequality given by Eq. (4.28) is invariant under multiplication of the tensors A 1 , A 2 , · · · , A n with positive numbers a 1 , a 2 , · · · , a n , we can add constraints on the norms of tensors without loss of generality. We assume that the TrA 1 = 1. where we apply Lemma 4.6. From Theorem 4.4 and set H k = log A q k , we have using the concavity of the logarithm and Jesen's inequality. Applying Eq. (4.30) to Eq. (4.29), we get n k=1 If we set the tensor X as the tensor X becomes a positive semi-definite tensor. Substituting Eq. (4.32) into Eq. (4.31), this theorem is proved for 0 < q ≤ 1.
For q → 0, we wish to prove the equality at Eq. (4.28). Because log X ≥ Lo I−X −1 for any positive tensor X , we have and we can assume that h q (s) ≥ 0 since we can scale each tensor A k by a positive number for k ∈ [n]. By Fatou's lemma, we have  By Eqs (4.33) and (4.34), we have the equality at Eq (4.28) as q → 0.

Applications: Random
Tensors. This section will apply multivariate Golden-Thompson inequality from Theorem 4.3 to form the tail bound for independent sum of random tensors.
Consider a random Hermitian tensor X ∈ C I1×···×IM ×I1×···×IM , i.e., each entry in this tensor is an independent random variable with x i1,··· ,iM ,j1,··· ,jM = x j1,··· ,jM ,i1,··· ,iM . We assume that the random tensor X has moments of all order n. We can construct tensor extensions of the moment generating function (MGF), and the cumulant generating function (CGF): where t ∈ R. The tensor MGF and CGF can be expressed as power series expansions: where the coefficients E(X n ) are called tensor moments, and Φ n are named as tensor cumulants. The tensor cumpulant Φ n has a formal expression as a noncommutative polynomial in the tensor moments up to order n. For example, the first cumulant is the mean and the second cumulant is the variance: 5.1. Laplace Transform Method for Tensors. We will apply Laplace transform bound to bound the maximum eigenvalue of a random Hermitian tensor by following lemma. This Lemma help us to control tail probabilities for the maximum eigenvalue of a random tensor by producing a bound for the trace of the tensor MGF defined in Eq. (5.1).

Proof:
Given a fix value t, we have P(λ max (Y) ≥ ζ) = P(λ max (tY) ≥ tζ) = P(e λmax(tY) ≥ e tζ ) ≤ e −tζ Ee λmax(tY) . (5.5) The first equality uses the homogeneity of the maximum eigenvalue map, the second equality comes from the monotonicity of the scalar exponential function, and the last relation is Markov's inequality. Because we have e λmax(tY) = λ max (e tY ) ≤ (2M − 1) N Tre tY , (5.6) where the first equality used the spectral mapping theorem, and the inequality holds because the exponential of an Hermitian tensor is positive definite and the maximum eigenvalue of a positive definite tensor is dominated by the trace [20]. From Eqs (5.5) and (5.6), this lemma is established.

Tail Bounds for Independent
Sum. This section contains abstract tail bounds for the sum of independent random tensors. This general inequality can serve as the progenitor of other random tensors majorization inequality.

Proof:
By settgin H k = tX k , p = 2 in Theorem 4.3, we will have 6. Conclusions. In this work, we extend Araki-Lieb-Thirring (ALT) inequality, Golden-Thompson(GT) inequality and logarithmic trace inequality to arbitrary many tensors. Our proofs utilize complex interpolation theory and asymptotic spectral pinching, providing a powerful mechanism to deal with multivariate trace inequalities for tensors. We then apply tensor Golden-Thompson inequality to provide the tail bound for the independent sum of tensors and this bound will play a crucial role in high-dimensional probability and statistical data analysis.