On Conditional Distribution of the Sample Mean for Densities with Singular Logarithmic Derivative

We study the regularity of the conditional distribution of the empiric mean of a finite sample of IID random variables with a bounded common probability density, conditional on the sample ”fluctuations”, and extend a prior result, proved for strictly positive smooth densities, to a larger class of smooth densities vanishing at one or more points of their support.


Formulation of the problem
Consider a sample of n > 1 IID (independent and identically distributed) random variables (r.v., in short) X 1 , . . . , X n with Gaussian distribution N (0, 1), and introduce the sample mean ξ = ξ n and the "fluctuations" η i relative to the mean: It is well-known from standard courses of the probability theory and statistics (cf., e.g., [6]) that ξ n is independent of the sigma-algebra F η generated by {η 1 , . . . , η n }. Therefore, the conditional probability distribution of ξ n given F η coincides with the unconditional one, N (0, n −1 ), thus ξ n has bounded conditional density and for any interval I ⊂ R of length |I|, we have In other words, the conditional probability distribution function (PDF) of the sample mean, given the fluctuations {η i }, is Lipschitz continuous. (1.4) implies also the following bound: for any F η -measurable r.v. λ, and any s ≥ 0 For the proof, it suffices to use conditioning on F η : setting I s (ω) = [λ(ω), λ(ω) + s], we have s.
(1.5) It is to be emphasized that the estimate on the probability P { ξ n (ω) ∈ I s (ω) } with a non-constant, random interval I s (ω) = [λ(ω), λ(ω) + s], is more informative, and more difficult to obtain, than its counterpart for a fixed interval I s = [a, a+s], a ∈ R, s > 0. With a fixed interval I s , a number of classical results of the probability theory are available. The regularity of the probability distribution of the sample mean ξ n (ω), or simply of the sum of IID r.v. X 1 + · · · + X n , is at least as good as that of each term in the sum, and the smoothing effect of convolution makes it even better with n → +∞. Standard textbook examples show that already the sum X 1 + X 2 of IID r.v. with purely singular continuous distribution can have absolutely continuous probability distribution; see, e.g., [6,v.II,Section V.4] where it is shown that the uniform distribution in [0, 1] is a convolution of two singular probability measures of Cantor type.
When I s is random, the situation becomes more complicated.
In the present paper, as in our prior works [3,4], we handle the problem by using the conditional probability distribution of the sample mean, given the sigma-algebra generated by the fluctuations η 1 , . . . , η n−1 . The main technical difficulty arising here is that of n degrees of freedom, initially present in the sample mean ξ n (ω) = (X 1 (ω) + · · · + X n (ω))/n, only one remains after fixing n − 1 parameters η 1 , . . . , η n−1 , which can be considered as coordinates in the n-dimensional sample space. When the r.v. X i have a density ρ(·) (which we always assume in the present paper), their joint probability distribution also has density, viz. p(x 1 , . . . , x n ) = ρ(x 1 ) · · · ρ(x n ), and the aforementioned conditional distribution of ξ n is proportional (up to a normalization factor) to the restriction of p(·) to the straight line in the n-dimensional sample space selected by the n − 1 conditions η 1 = c 1 , . . . η n−1 = c n−1 . The normalization factor brings up a technical problem, for it can be very large (we address this problem in detail in Sections 4 and 5). Also, if ρ is non-constant, i.e., the marginal distribution of X i is not a uniform one in some bounded interval, then it takes at least two different values, say, a < b, thus the tensor product p takes values a n and b n , with the ratio (b/a) n ≫ 1 for n ≫ 1. In the intended applications, this might result in inadequate regularity bounds, so we have to address this issue, too.
A natural question is, to what extent the above mentioned remarkable property of Gaussian IID samples can be generalized for other types of marginal probability measures. Surprisingly, this problem appears to be unexplored in a reasonably general setup (cf. Section 6). The author of these lines would greatly appreciate any constructive feedback from experts in the field that would shed more light on the regularity problem at hand. A particularly challenging case is where the IID r.v. X i have a purely singular probability distribution, but the (cumulative) probability distribution function (PDF), t → F (t) := P { X 1 (ω) ≤ t }, t ∈ R, is, for example, Hölder continuous, or more generally, has an explicitly known continuity modulus In a prior work [4], we studied this problem under the following condition: (V1): The probability measure µ has bounded support, supp µ = [a, b], and admits on (a, b) smooth probability density ρ satisfying the following conditions: The prototypical example is the uniform distribution on an interval [a, b]. Informally, one can say that the probability distributions satisfying (V1) are "comparable" to the uniform distributions. Under this assumption, we proved Hölder continuity of the conditional distribution of the sample mean for typical conditions.
More precisely, we introduced in [4] a property of the probability measures on R resembling (1.4) and called Regularity of the Conditional Mean: (RCM) For all n ≥ 2 and some C, a ∈ (0, +∞), b ∈ (0, 1], for any F η -measurable random variable λ, one has (1.8) The above form of conditional regularity of the sample mean for typical conditions is well-adapted to the applications (briefly discussed in Section 1.2 below) which served as the principal motivation for our project.
In the present paper, we further develop our approach from [3,4] and extend (RCM) to a much larger class of smooth marginal densities. Specifically, we allow the logarithmic derivative of the common density ρ to have power-law singularities due to vanishing of ρ at a finite number of points of its support.
The principal technical result of this paper, Theorem 1, is proved under the following hypothesis: (V2): The probability measure µ has bounded support, supp µ = [a, b], and admits on the interior (a, b) a strictly positive smooth probability density ρ satisfying the following condition: . (1.9) Moreover, the method of proof of Theorem 1 provides some explicit estimates for the exponents a and b figuring in (1.8).
Note that the condition (1.10) on the logarithmic derivative covers the all the cases where the density ρ vanishes at one of the edges of its support at power-law rate; the upper bound on (ln ρ) ′ actually provides some regularity of the decay at the respective edge, but the latter is a substantially milder condition than, for example, an exact decay asymp- We shall see that Theorem 1 naturally extends from the measures satisfying (V2) to a much larger class of measures, obtained from those satisfying (V2) by the operations of (i) shifts, (ii) convolution, and (iii) randomization.
The role of shifts is clarified in Corollary 1. Corollary 2 evidences that, starting from measures satisfying (V2), one can construct a rich class of measures featuring (RCM), under the condition (V3): The probability measure µ on R is a convolution of the form where µ 1 fulfills the condition (V2), and µ 2 is an arbitrary probability measure on R. Corollary 3 derives from Theorem 1 property (RCM) for measures which fulfill the following condition: (V4): The probability measure µ on R is obtained by randomization of measures µ 1 , . . . , µ K satisfying (V2), i.e., a random variable X with probability distribution µ admits a decomposition of the form

Motivation: random N -particle Hamiltonians
The regularity properties of the conditional distribution of the sample mean, given the fluctuations (η i ), are interesting in itself, and may prove useful in a fairly broad context of mathematical statistics, but the main motivation for this work actually comes from the spectral theory of multi-particle random Hamiltonians studied in the Anderson [1] localization theory; for an introduction to this relatively new area of mathematical physics of disordered quantum systems one can recommend a recent monograph [5].
It has been shown in [3] that optimal decay bounds on the decay of bound (i.e., square-summable) eigenstates of multiparticle quantum systems with a nontrivial interaction, subject to a common random potential field, require concentration estimates for the eigenvalues of random Hamiltonians which cannot be derived from the standard, Wegner-type estimates (cf. [7]) used in the single-particle models. More to the point, one needs eigenvalue comparison estimates for pairs of random operators H 1 (ω), H 2 (ω) which are stochastically correlated in a very strong way: the same family of random variables which affects H 1 (ω) also affects H 2 (ω), and vice versa. As a result, no stricto sensu stochastic decoupling is possible in such models. However, the eigenvalue comparison analysis for H 1 (ω) and H 2 (ω) becomes much simpler, once the random field generating the potential in a finite domain (e.g., a finite lattice subset (1.14) The reason is that after conditioning on the random fluctuation field η Q,x (ω), the random potential V (x, ω) becomes a sum of a nonrandom background potential η Q,x (ω), ignored in the principal estimates, and of a random but spatially constant potential ξ Q (ω) 1 Q (x). As an operator of multiplication by a constant, the latter commutes with all other operators involved, and this simplifies considerably the spectral analysis. See the details in [3]. Naturally, the crucial issue is the regularity of the conditional distribution of the sample mean ξ Q (ω) given the fluctuations η Q,x (ω). Our results show that this conditional distribution is Hölder continuous, and such a regularity suffices for the methods of spectral theory of random operators; cf., e.g., [5].

Main results
Our goal is to analyze the case where the common probability distribution of the IID random variables X j , 1 ≤ j ≤ n, is absolutely continuous, with probability density ρ, and the support S = supp ρ of the density is a finite union of intervals: With the help of the well-known randomization procedure (cf. [6]), namely, by making use of Corollary 3, one can reduce the analysis of densities supported by a union of intervals to the case of a single supporting interval, and this is what we shall do first. The class of densities ρ supported on an interval [a, b] which do not vanish on its interior (a, b) and have there bounded logarithmic derivative, was studied in [4], so we focus on the case where ρ vanishes at one or both endpoints of the support. To cover in one argument all possible situations, one can always decom- In the next section, we perform such analysis for a density vanishing at exactly one of the endpoints of the supporting interval. The reader will see, however, that the probabilistic bounds stemming from our analysis become slightly better for densities vanishing at both edges of the support (cf. Remark 2). Theorem 1. Assume that the common probability distribution of IID random variables X 1 , . . ., X n admits a probability density ρ, with ∥ρ∥ ∞ = ρ < +∞, satisfying the following conditions: (i) supp ρ is a bounded interval: (2.3) (ii) ρ vanishes at a and admits the upper bound Then for any A > 1 and α ∈ (0, 1) there exist constants C ′ , C ′′ ∈ (0, +∞), depending upon A, α, ℓ and the density ρ, such that for any 0 ≤ s ≤ C ′′ n − 1 (A−1)α ) and any F η -measurable random variable λ, one has, with I s (ω) := [λ(ω), λ(ω) + s]: The explicit form of the RHS in (2.6) shows that (V2) gives rise to the property (RCM).

Optimization of the Hölder exponents
In applications to the eigenvalue concentration estimates for random Hamiltonians, one often has s ≤ e −n β , β > 0, so that s decays as n → ∞ much faster than any power-law function n → n −B , 0 < B < ∞. From this perspective, a pre-factor polynomial in n is essentially negligible compared to s, and the exponent a figuring in (RCM) is of much greater importance. To balance the contributions from the two terms in the RHS of (2.6), let us find the optimal value (if it exists) α = α γ , as a solution of the equation resulting in a simpler probability bound with C ′ (n), C ′′ (n) polynomial in n. Comparing to the optimal bound for the probability densities with regular (bounded) logarithmic derivative from [4], we see that the above exponent 2/3 can be improved by a judicial choice of A, viz.
A > A γ = 1 + 1 2 γ, (2.10) and the closer is A to 1, the better. For example, with γ = 1 (linear decay of the density at the edges), and A = 4/3, we obtain for s small enough. Furthermore, the optimal value a = 2+γ A+2+γ for the exponent a figuring in (RCM) approaches 1 as γ → +∞. In view of Remark 2, the final bound becomes even stronger whenever ρ vanishes at both edges of its support.
This corroborates the intuitive idea that vanishing of the density at some edges of its support should enhance the regularity bound for the conditional distribution of the sample mean.

Extension to the case of random variables with non-identical expectations
By a simple change of variables, the assertion of Theorem 1 can be adapted to independent random variables with different expectations.
Corollary 1. Suppose that the random variables X 1 , . . . , X n fulfill the hypotheses of Theorem 1 (hence, they have the property (RCM)). Pick any real numbers a 1 , . . . , a n . Then the random variables X i := X i + a i , i = 1, . . . , n, also have the property (RCM).
Indeed, arbitrary shifts X i → X i + a i result only in a translation of the sample mean ξ n , even with conditioning on F η , but the upper bounds on the logarithmic derivative of the conditional density of ξ n given F η remain unaffected. A direct inspection of the proof of Theorem 1 in Section 5 evidences that this suffices for the main assertion to hold true.
Proof. Let F (2) be the sigma-algebra generated by X (2) . The conditional distribution of X given X (2) differs from that of X (1) (i.e., from the measure µ 1 ) only by a shift (nonrandom, conditional on F (2) ). Thus we have (2.12) where the last inequality follows from Theorem 1, since µ 1 fulfills (V2).

Basic geometrical objects and notations
Let be given a real number ℓ > 0 and an integer n ≥ 2. Consider a sample of n IID random variables with uniform distribution Unif([0, ℓ]), and introduce again the sample mean ξ = ξ n and the "fluctuations" η i relative to the mean: Further, consider the n-dimensional Euclidean space of real linear combinations of the random variables X i . Clearly, the variables η i : R n → R are invariant under the group of translations (X 1 , . . . , X n ) → (X 1 + t, . . . , X n + t), t ∈ R, (3.2) and so are their differences Then the space R n is stratified into a union of affine lines of the form vector (1, . . . , 1). Denote and equip each nonempty interval X (Y ) ⊂ R n with the structure of a probability space inherited from R n : • if |X (Y )| = 0 (an interval reduced to a single point), then we introduce the trivial sigma-algebra and the counting measure;

110
On Conditional Distribution of the Sample Mean for Densities with Singular Logarithmic Derivative • if |X (Y )| = r > 0, then we use the inherited structure of an interval of a one-dimensional affine line and the normalized measure with constant density r −1 with respect to the inherited Lebesgue measure on X (Y ).
The transformation X → (ξ n , η 1 , . . . , η n−1 ) is nondegenerate, but not orthogonal. We will have to work with the metric on X (Y ), induced by the standard Riemannian metric in the ambient space R n ; to this end, introduce an orthogonal coordinate transformation in R n , X → (ξ n ,η 1 , . . . ,η n−1 ), such thatξ (3.6) the exact form ofη j , j = 1, . . . , n − 1 is of no importance for our analysis, provided that the transformation is orthogonal.
Remark 1. For later use, note that, due to (3.6), each of the re-scaled variables n 1/2 X i can serve as the normalized length parameter on the elements X (Y ). Along an element X (Y ), one can simultaneously parameterizeξ and the variables X i , by settingξ(t) = c 0 +t, X j (t) = c j +n −1/2 t, with arbitrarily chosen constants c j . Here,ξ n is a natural length parameter on X (Y ), since the transformation X → (ξ n ,η 1 , . . . ,η n−1 ) is orthogonal.

Probability of short intervals
In this section, we assume that supp ρ = [0, ℓ] and the logarithmic derivative ρ ′ /ρ is well-defined on the open interval (0, ℓ). In addition, we assume a power-law decay of ρ at the edge 0: Later, we will complete this upper bound by a certain regularity of the edge decay (cf. (2.5)). By a change of variable t → ℓ − t, the obtained bounds will apply to the densities vanishing at the right edge of the support.
In the following preparation lemma, we use only boundedness of ρ and the decay condition (4.1); smoothness of the density is irrelevant for this intermediate result.
Lemma 1. Assume that the random variables X 1 , . . ., X n , n ≥ 2, are IID and admit a bounded density ρ supported by an interval [0, ℓ], ℓ > 0, with ∥ρ∥ ∞ = ρ < ∞. Furthermore, assume that ρ vanishes at 0 and satisfies (4.1). Then for all t ∈ (0, ℓ/2] one has While X(X) and X(X) vary along the elements X (Y ), their difference X(X)−X(X) does not; it is uniquely determined by X (Y ). According to Remark 1, each variable n 1/2 X i , i = 1, . . . , n, restricted to X (Y ), can serve as a length parameter on X (Y ), compatible with the metric induced by the Euclidean distance in the ambient space R n . Thus the range of each n 1/2 X i | X (Y ) is an interval of length |X (Y )|. One can increase (resp., decrease), e.g., the value of X 1 , as long as all {X i , 1 ≤ i ≤ n} are strictly smaller than ℓ (resp., strictly positive). Therefore, the maximum increment of X 1 (indeed, of any X i ) along X (Y ) is given by ℓ − X(X), and its maximum decrement equals X(X), so the range of the normalized length parameter n 1/2 X 1 along X (Y (X)) is an interval of length |X (Y (X))| = n 1/2 ( ℓ − X(X) + X(X) ) .

(4.4)
Both X(X) and ℓ − X(X) are non-negative, so one has the implication we have, for any i, (4.8) Thus the union ∪ i̸ =j A ij (t) contains all the samples X with |X (Y )| < t.
By hypothesis, for any i ̸ = j, the random variables X i , X j are independent, so (4.9) Since ∥ρ∥ ∞ = ρ, we have and by (4.1), Consequently, This completes the proof.
Remark 2. As was said in Section 2, in the case where ρ vanishes at both edges of its support, 0 and ℓ, the probabilistic bounds for short intervals become stronger. For example, if ρ(t) ≤ Ct γ and ρ(t) ≤ C ′ (ℓ − t) γ ′ , we can replace (4.10) by a stronger bound P { ℓ − X j < t } ≤ C ′ t 1+γ ′ , and the resulting bound is (4.14) 5 Regularity bounds for densities with singular logarithmic derivative In order to assess the conditional regularity of the sample mean on typical (viz., not too short) linear elements X (Y ), we need to complement the upper bound (4.1) with some assumption regarding the regularity of the decay of ρ(t) as t ↘ 0. Having in mind first of all applications to realistic physical models of disorder, we could restrict ourselves to the case where ρ(t) ∼ Ct γ , but it actually turns out that one can treat a more general situation where the logarithmic derivative fulfills the condition (2.5), or even a condition of the form for some B ∈ (0, +∞). For brevity, we consider only the case B = 1 covering all possible rates γ > 0 of power-law decay at the edge of the support.
Proof of Theorem 1. Without loss of generality, it suffices to prove the claim in the particular case where supp ρ = [0, ℓ], which we assume below. Fix some A > 1 and α ∈ (0, 1). Further, fix an element X (Y ) with |X (Y )| ≥ s α ; the probability to have an element X (Y ) with |X (Y )| < s α is upperbounded in (5.15), with the help of Lemma 1.
Next, introduce a length parameter t on X (Y ), so we can identify . As t runs throughJ, the i-th coordinate of the point x(t) ∈ X (Y ) runs through an Recall that it is not the sample mean ξ n but its rescaled counterpartξ n = n 1/2 ξ n which gives a normalized length parameter on X (Y ), so we re-write the key probability as follows: so that |Ĩ s (ω)| = s √ n. Denote by p(t) the product density ρ(x i (t)) · · · ρ(x n (t)) at x(t) = (x 1 (b i − t), . . . , x n (b i − t)) ∈ X (Y ). We have and therefore, The function t → p(t) gives the conditional density induced on X (Y ), up to a normalizing factor 1/Z Y , with To upper-bound the maximum of the conditional density, we have to lower-bound the integral Z Y . Note that for t small enough, the ratio in the RHS of (5.5) is close to 1; to see this, use the hypothesis Recall that we assumed |X (Y )| ≥ s α , and by hypothesis of the Theorem, so the conditional probability density on the segment X (Y ) is uniformly bounded by an s-dependent constant: p(x(t)) Z k dt ≤ C 5 s −Aα · |I s (η)| = C 5 n 1/2 s 1−Aα . (5.14) Owing to Lemma 1, we know that Collecting (5.14) and (5.15), the claim follows.

Open problems
It seems to be a very natural conjecture that the property (RCM) holds true for probability measures with bounded compactly supported probability density. However, operating without any additional assumption on the regularity of the bounded probability density (apart from its measurability and boundedness) would require new analytic ideas.
A more challenging problem concerns the validity of (RCM) for more general probability measures with Hölder continuous PDF. The author's discussions with a number of experts in probability and functional analysis seem to indicate that the question on regularity of conditional distribution of the sample mean on typical fibers {(x 1 , . . . , x n ) : η i = c i , i = 1, . . . , n − 1} ⊂ R n (6.1) is far from obvious, when R n is endowed with the product measure µ ⊗n with a Hölder continuous, but possibly purely singular continuous measure µ on R, and it rises in various problems in mathematical physics of disordered media.

112
On Conditional Distribution of the Sample Mean for Densities with Singular Logarithmic Derivative

Conclusion
We have proved that the regularity properties of the sample mean ξ n , conditional on the sample "fluctuations" relative to ξ n , generalizing the well-known property of Gaussian samples and proved earlier for a class of marginal densities with bounded logarithmic derivative, hold true for a much larger class of smooth densities which can vanish at some points in a sufficiently regular way. Moreover, we have shown that vanishing of the marginal density can only result in stronger probabilistic concentration bounds for the values of the sample mean. Intuitively, this seems quite natural, but technically, this was not quite clear from prior works based on global regularity of the logarithmic derivative of the density.
From the point of view of applications, the new result covers a large number of models where the probability distribution comes from physical laws or, more generally, from explicit calculations with elementary functions which are smooth in their domains of definition and usually feature a power-law decay at certain points. In particular, this covers many models of disorder in mathematical physics of multiparticle quantum systems, where it provides a crucial ingredient for the analysis of eigenvalue concentration in presence of disorder and of a nontrivial interaction between particles (cf. [5] and references therein).