Optimized Regularity Estimates of Conditional Distribution of the Sample Mean

We prove an optimized estimate for the regularity of the conditional distribution of the empiric mean of a finite sample of IID random variables, conditional on the sample ”fluctuations”. Prior results, based on bounds in probability, provided a Hölder-type regularity of the conditional distribution. We establish a Lipschitz regularity, using bounds in expectation. The new estimate, extending a well-known property of Gaussian IID samples, is a crucial ingredient of the Multi-Scale Analysis of multi-particle Anderson-type random Hamiltonians in a Euclidean space. In particular, the Hölder regularity of the multi-particle eigenvalue distribution, sufficient for the localization analysis of N-particle lattice Hamiltonians, with N ≥ 3, needs to be replaced by Lipschitz regularity for similar Hamiltonians in the Euclidean space.


The general probabilistic framework
Consider a sample of n > 1 IID (independent and identically distributed) random variables X 1 , . . . , X n with Gaussian distribution N (0, 1), and introduce the sample mean ξ = ξ n and the "fluctuations" η i relative to the mean: It is well-known from standard courses of the probability theory (cf., e.g., [11]) that ξ n is independent from the sigmaalgebra F η generated by {η 1 , . . . , η n } (the latter are linearly dependent and have rank n−1). To see this, it suffices to note that all η i are orthogonal to ξ n with respect to the standard scalar product in the linear space formed by X 1 , . . . , X n , given by where Y and Z are real linear combinations of X 1 , . . . , X n (recall: E [ X i ] = 0).
Therefore, the conditional probability distribution of ξ n given F η coincides with the unconditional one, so ξ n ∼ N (0, n −1 ), thus ξ n has bounded density Moreover, for any interval I ⊂ R of length |I|, we have In some applications to the eigenvalue analysis for random Hamiltonians, discussed below in Section 2 and in subsection 1.2, one has to estimate the probabilities of the form one may consider the probability where the set A := { (x 1 , x 2 ) ∈ R 2 } is defined by the inequalities In this particular case -for Gaussian samples -the conditional regularity of the sample mean ξ n (given the fluctuations) is granted, but this is not always so, as shows the following elementary example where the common probability distribution of the sample X 1 , X 2 is very regular: , so X i admit a compactly supported probability density bounded by 1. In this simple example the random vector (X 1 , X 2 ) is uniformly distributed in the unit square [0, 1] 2 , and the condition η = c selects in the twodimensional plane with coordinates (X 1 , X 2 ) a straight line parallel to the main diagonal {X 1 = X 2 }. The conditional distribution of ξ given {η = c} is the uniform distribution on the segment of length vanishing at 2c = ±1. For |2c| = 1, the conditional distribution of ξ on J c is concentrated on a single point, which is the ultimate form of singularity.
The above mentioned remarkable particularity of the Gaussian distributions enables one, among other things, to single out one component of a given sample and analyze its impact on one or another functional of the sample of size n > 1, while keeping fixed the remaining n − 1 degrees of freedom. More to the point, in the spectral analysis of random operators (cf. Section 2), the sample mean modulates a scalar operator (a multiple of the identity operator), which shifts all eigenvalues in an explicit way, while the effect of the fluctuations is much harder to control quantitatively. The question of regularity of the probability distribution of the sample mean ξ n with more general (in particular, singular continuous) marginal probability measures, conditional on all the "fluctuations", seems to be interesting in itself; for the author of these lines, however, the motivation came from the spectral theory of random operators.
For the moment, we can address only sufficiently regular marginal probability measures, which leaves the room for further research in this direction.
The main result of the present paper is given by Theorem 8.1, generalizing the estimates from Theorem 6.1 and Theorem 7.1.

Motivation: random N -particle Hamiltonians
The celebrated Anderson localization theory, originated in the seminal paper by P. W. Anderson [1] in 1958 (Nobel Prize in 1977) studies the evolution of quantum particles in a disordered media. Mathematically speaking, one has to study the decay properties of the Green functions, eigenfunctions and eigenfunction correlators of random quantum Hamiltonians (see, e.g., the monographs [8,9,15] for a general introduction to the Anderson localization theory). It turns out that the free propagation of particles or waves can be strongly inhibited by disorder. The crucial role, both from mathematical and physical point of view, is played by the so-called "resonances", or "small denominators"; the latter appear in a number of problems in mathematical physics and beyond. In simple terms, one has to assess the regularity of the eigenvalue distribution of random self-adjoint operators (or finite-dimensional symmetric matrices). The first fairly general estimate of such kind was proved in the celebrated paper by Wegner [16]. Today there are various approaches leading to the Wegner-type EVC bounds (cf. [8,15]) more general and more optimal that those relying on the results of the present paper. However, the situation is quite different in the area of multi-particle disordered quantum systems with a nontrivial inter-particle interaction. A considerable number of challenging problems here remain open due to the lack of adequate methods for assessing the eigenvalue concentration and eigenvalue comparison in interactive systems. Theory of multi-particle Anderson Hamiltonians is a relatively new direction in this field (cf., e.g., [2,4,9]), and it presents greater challenges in the EVC estimation. The main reason is that already for N = 2 particles, the eigenvalues are "composite" quantities, and the resonances between them, in the form of "small denominators", are structural in nature and cannot be attributed to one or another specific locus in the physical, single-particle configuration space. As a result, correlations between composite EVs do not decay with distance; these strong correlations become particularly difficult to deal with starting from N = 3. Suffices to say that there still are open problems in the area of N -particle Anderson Hamiltonians in a Euclidean space, while the situation is somewhat simpler for the lattice Hamiltonians (cf. [10]).
A partial solution to this general problem, initially proposed in [6], used the specific property of Gaussian distributions which we discussed in subsection 1.1. In a modified form, it was extended to the IID random potentials with uniform marginal distributions and "smooth" perturbations thereof; see [5,10]. However, the Hölder regularity (of any order θ < 1) of the N -particle EV distribution used in the context of lattice Hamiltonians turns out to be insufficient for the applications to the N -particle Anderson models in R d , where one needs the Lipschitz continuity of the EV distribution, and this is the main subject of the present paper.
We hope that the approach used in [6] and in this paper may prove useful in other areas of mathematical physics and applications of the probability theory.
A detailed discussion of the interactive Anderson models, especially in the Euclidean space, is certainly well beyond the scope of this short note; an introduction to this relatively recent direction of research in mathematical physics of random operators can be found in the monograph [9]. However, it is to be emphasized that the results of this paper are crucial for the rigorous proof of uniformly strong decay bounds on eigenfunction correlators with respect to the natural distance in the multi-particle configuration space and not in the socalled Hausdorff distance, used in the works [7,13,12] and some other papers. We also stress that the new EVC bounds are required for the proof of localization in a physically realistic geometrical setting (in finite-size disordered samples); see the discussion in [10].

An application to the Wegner-type bounds
Let Λ be a finite graph, with |Λ| = n ≥ 1, and H(ω) = H Λ (ω) be a discrete Schrödinger operator (cf., e.g., [8,9]) acting in the finite-dimensional Hilbert space H = H Λ = ℓ 2 (Λ), with an IID random potential V : Λ×Ω → R, relative to some probability space (Ω, F, P). For example, one can take Λ = [−L, L] ∩ Z 1 and define the random, second order discrete Schrödinger operator H(ω) : with Dirichlet boundary conditions (cf. [8,9]) outside [−L, L]; here V : Λ × Ω → R is a random field on Λ relative to some probability space (Ω, B, P). On an arbitrary connected finite graph Λ, one usually takes the kinetic energy operator H 0 given by the negative graph Laplacian (cf., e.g., [3]), here d(x, y) is the canonical graph-distance on Λ, defined as the shortest path from x to y over the edges of the graph in question.
Decomposing the random field V on Λ, we can represent H(ω) as follows: where the self-adjoint operator A(ω) (here η x is identified with the operator of multiplication by the "residual", "fluctuation" random potential x → η x (ω)) is F η -measurable, and so are its eigenvalues µ j (ω), j = 1, . . . , n. Since A(ω) commutes with the scalar operator ξ n (ω) 1, the eigenvalues λ j (ω) of H(ω) have the form The numeration of the eigenvalues λ j (ω), µ j (ω) is of course not canonical, but they can be consistently defined as random variables on Ω.
The representation (2.1) implies immediately the following EVC bound. Fix and interval I s = [t, t + s] and let P Is (H(ω) be the spectral projection of operator H(ω) to the interval I s (cf., e.g., [14]), then Further, omitting the argument ω for brevity and setting µ j (ω) := −µ j (ω) + t, we have , fixed under the conditioning. Therefore, the unconditional probability P { λ j ∈ I s } can be assessed by analyzing the regularity of the conditional probability distribution of ξ n figuring in the RHS of (2.3).

Reduction to the local analysis in the sample space
Assume that the support S ⊂ R of the common continuous marginal probability measure of the IID random variables X j , 1 ≤ j ≤ n, is covered by a finite or countable union of intervals: Let K = K n , and for each k = (k 1 , . . . , k n ) ∈ K, denote Owing to the continuity of the marginal measure, J k are essentially disjoint: two distinct intervals J k , J l can only have common endpoints. Respectively, the family of the parallelepipeds {J k , k ∈ K} forms a partition K of the sample space, which we will often identify with the probability space Ω. Further, let F K be the sub-sigma-algebra of F generated by the partition K. The probabilities of the general form (1.1) can be assessed as follows: Let P k {·} be the conditional probability measure, given the event {X ∈ J k }, E k [ · ] the respective expectation, and p k = P { J k }. Then we have (3.1) Therefore, one may seek a satisfactory bound on the LHS of (3.1) by assessing the "local" conditional probabilities where each random variable X j is restricted to a subinterval J kj of its global support, so the entire sample X = (X 1 , . . . , X n ) is restricted to the parallelepiped J k ⊂ R n .
In the next section, we perform such analysis in the case of a uniform distribution of the IID variables X i .

Uniform marginal distributions. General setup
Let be given a real number ℓ > 0 and an integer n ≥ 2. Consider a sample of n IID random variables with uniform distribution Unif([0, ℓ]), and introduce again the sample mean ξ = ξ n and the "fluctuations" η i relative to the mean: Further, consider the n-dimensional Euclidean space of real linear combinations of the random variables X i . Clearly, the variables η i : R n → R are invariant under the group of translations and so are their differences Then the space R n is stratified into a union of affine lines of the form and endow each nonempty interval X (Y ) ⊂ R n with the natural structure of a probability space inherited from R n : • if |X (Y )| = 0 (an interval reduced to a single point), then we introduce the trivial sigma-algebra and the counting measure; • if |X (Y )| = r > 0, then we use the inherited structure of an interval of a one-dimensional affine line and the normalized measure with constant density r −1 with respect to the inherited Lebesgue measure on X (Y ).
The transformation X → (ξ n , η 1 , . . . , η n−1 ) is nondegenerate, but not orthogonal. We will have to work with the metric on X (Y ), induced by the standard Riemannian metric in the ambient space R n ; to this end, introduce an orthogonal coordinate transformation in R n , X → (ξ n ,η 1 , . . . ,η n−1 ), such thatξ the exact form ofη j , j = 1, . . . , n − 1 is of no importance for our analysis, provided that the transformation is orthogonal.
Remark 4.1. For later use, note that, due to (4.3), each of the re-scaled variables n 1/2 X i can serve as the normalized length parameter on the elements X (Y ). Along an element X (Y ), one can simultaneously parameterizeξ and the variables X i , by settingξ(t) = c 0 +t, X j (t) = c j +n −1/2 t, with arbitrarily chosen constants c j . Here,ξ n is a natural length parameter on X (Y ), since the transformation X → (ξ n ,η 1 , . . . ,η n−1 ) is orthogonal.

Short intervals are unlikely
While X(X) and X(X) vary along the elements X (Y ), their difference X(X)−X(X) does not; it is uniquely determined by X (Y ). According to Remark 4.1, each n 1/2 X i , i = 1, . . . , n, restricted to X (Y ), provides a normalized length parameter on X (Y ); thus the range of each n 1/2 X i | X (Y ) is an interval of length |X (Y )|. One can increase (resp., decrease), e.g., the value of X 1 , as long as all {X i , 1 ≤ i ≤ n} are strictly smaller than ℓ (resp., strictly positive). Therefore, the maximum increment of X 1 (indeed, of any X i ) along X (Y ) is given by ℓ − X(X), and its maximum decrement equals X(X), so the range of the normalized length parameter n 1/2 X 1 along X (Y (X)) is an interval of length Since both X(X) and ℓ − X(X) are non-negative, one has the implication With we have, for any i, Thus the union ∪ i̸ =j A ij (t) contains all the samples X with |X (Y )| < t.
The sample {X k } is IID, with common probability density bounded by ρ = ℓ −1 , so for any i ̸ = j Therefore, (5.6) This completes the proof.
By a change of variable, one can extend the above result to n independent random variables uniformly distributed in their individual intervals J i = [a i , a i + ℓ], for arbitrary a 1 , . . . , a n ∈ R.
6 Regularity bound for the uniform distribution Theorem 6.1. Let be given IID random variables X 1 , . . . , X n and a measurable function λ : . Then for any s ∈ (0, 1], one has Proof. Recall that the natural length parameter on the lines L(Y ) is given byξ = √ nξ and not by ξ itself, since the gradient of ξ is given by the vector (n −1 , . . . , n −1 ) of norm 1/ √ n. Therefore, when ξ runs through I s (Y ),ξ runs through where, by virtue of (5.6), The second summand in the last RHS in (6.2) can be assessed as follows: (6.5) Now we apply the well-known formula of integration by parts for the Stiltjes integral (cf., e.g., [11,Sect. V.6]) giving the moment of order α of a probability distribution with probability distribution function (PDF) F (·): valid under the assumption of convergence of the RHS integral. In our case, we only have to integrate over a finite interval [s √ n, ℓ √ n], thus avoiding singularity at r = 0. For the PDF F l (·), using the upper bound (6.4), we obtain (6.6) Collecting (6.3), (6.5), (6.6), and using s/ℓ ≤ 1, the assertion follows:

Smooth positive densities
Now we consider a richer class of probability distributions. While the conditions which we assume are certainly very restrictive, they are sufficient for the applications to some physically relevant Anderson models.
Theorem 7.1. Assume that the common probability distribution of the IID random variables V j , j = 1, . . . , n, with the cumulative probability distribution function F V , satisfies the following conditions: (i) the probability distribution is absolutely continuous: (7.1) (ii) the probability density ρ(·) has bounded logarithmic derivative on (a, a + ℓ): Then there exists a constant C = C(F V , ℓ) < ∞ such that for any s ∈ (0, ℓn −1 ) and any F η -measurable random variable λ, setting I s (ω) := [λ(ω), λ(ω) + s], one has the following bound: Proof. Without loss of generality, it suffices to prove the claim for supp ρ = [0, ℓ], which we assume below. As in Section 3, introduce a partition of the sample space into the cubes J k , induced by the decomposition [0, ℓ] = ⊔ k J k : The hypothesis (7.2) implies that for any x ∈ J k the logarithm of p(x) is well-defined and satisfies the upper bound where |α n (x)| ≤ Cℓ; with ℓ fixed in the condition (7.1), p(x) is therefore uniformly bounded. Now introduce in J k : • the uniform probability distribution P k , i.e., the normalized measure with constant density p k w.r.t. the Lebesgue measure; • the probability distribution induced by P, conditional on {X ∈ J k }, i.e., the normalized measure with density Due to continuity of the density p, we have ∫ J k P(y) dy = c|J k |, for some c ∈ [e −αn , e +αn ], so Hence for any event A, we have Finally, it follows from (7.4) and (3.1) that We will call the property of the common probability distribution of the IID random variables X 1 , . . . , X n , expressed by the inequality of the form (7.3), with the RHS polynomially bounded in n, the strong regularity of the conditional mean (SRCM).

Extension to convolution measures
The results of Section 7 can be easily adapted to a class of convolution probability measures P = P 1 * P 2 where at least one of the measures P 1 , P 2 satisfies the hypotheses of Theorem 7.1.
Theorem 8.1. Assume that the common probability distribution ν of the IID random variables V j , j = 1, . . . , n, admits the representation ν = ν 1 * ν 2 , where ν 1 satisfies the hypotheses of Theorem 7.1. Then there exists a constant C = C(F V , ℓ) < ∞ such that for any s ∈ (0, ℓn −2 ) and any F η -measurable random variable λ, setting I s (ω) := [λ(ω), λ(ω) + s], one has the following bound: Proof. It follows immediately from the assumption (8.1) that the random variables V j can be represented as sums V j (ω) = A j (ω) + B j (ω), where • the family of random variables {A j , 1 ≤ j ≤ n} is IID, with common probability distribution ν 1 ; • the family of random variables {B j ), 1 ≤ j ≤ n} is also IID, with common probability distribution ν 2 ; • the families {A j } and {B j } are mutually independent. Let P, P 1 and P 2 be the product probability measures in the sample space ∼ = R n , generated respectively by the marginal measures ν, ν 1 and ν 2 . Further, let F B be the sigma-algebra generated by the family {B j , j = 1, . . . , n}. Then the conditional measure P { · | F B } is equivalent to P 1 , thus Example 1. The so-called triangular distribution ν * ν, where ν is the uniform distribution in [0, 1]. More generally, one can take a convolution power ν * n , n ≥ 2. This example shows that the property SRCM can hold in a class of probability measures with density vanishing at the edges of its support.
Example 2. The convolution ν = ν 1 * ν 2 of the uniform distribution ν 1 = Unif([0, 1]) with the exponential distribution ν 2 (with density 1 [0,+∞) e −t ). Here the property SRCM holds true for upper-unbounded random variables. As pointed out in the Introduction (cf. also [6]), the Gaussian distribution features a particularly strong form of the property SRCM. Theorem 8.1 extends SRCM to a much larger class of unbounded random variables. In particular, note that the decay rate of the tail probabilities P { |V (ω)| > t }, as t → +∞, can be quite slow. For example, one can take as ν 2 any stable law (cf., e.g., [11]) of index α ∈ [1,2]; here α = 2 corresponds to the Gaussian distribution and α = 1 to the Cauchy distribution.

Conclusion
We have shown that the well-known and widely used property of the Gaussian IID samples of random variables, viz.
the independence of the sample mean of the "sample fluctuations", has a direct analog for a much larger class of marginal probability distributions. For the moment, our approach requires the marginal distribution to have a smooth probability density with respect to the Lebesgue measure. Compared to prior results (cf. [5,10]), where the conditional distribution of the sample mean (given the sigma-algebra generated by all sample fluctuations) was proved to be Hölder-continuous of order θ = 2/3, the present paper gives a sharp Hölder exponent θ = 1 (thus proving Lipschitz continuity of the conditional measure at hand).
The new result is an important component in the spectral analysis of random N -particle quantum Hamiltonians, describing the quantum transport of interacting particles (e.g., electrons) in a disordered environment. Quantum transport and/or localization in interacting disordered systems is a relatively new direction both in theoretical and mathematical physics, where there still are many challenging open problems, and our results shed a new light in this direction. They will constitute a major component in the localization analysis of N -particle Anderson Hamiltonians with nontrivial interaction in a Euclidean space, in a forthcoming paper extending the results of our recent work [10] to continuous systems.
We hope that the probabilistic problem investigated in this paper, as well as the method used here, can prove useful in a more general mathematical framework where analytic aspects intertwine with the probabilistic ones.