Welfare Analysis and Policy Implications of Bundling Decisions by Firms

This paper models an original approach where the monopolist firm no longer uses pure bundling with weights 1:1. The decisions of the firm take place in a two-step optimisation process. In the first stage, using port- folio optimisation it decides whether to bundle and also sets the optimal weight for each good. In the second stage, it sets the profit maximising bundle price on the basis of the chosen weights. We compare the profit, consumer and welfare implications of our pure bundling model to the usual 1:1 pure bundling model, and comment on the competition policy making implications.


Introduction
The use of bundling as an instrument of second degree price discrimination is well grounded in the theoretical literature of industrial organisation (Adams and Yellen [1]; McAfee, McMillan and Whinston [2]; Stigler [3]). Pure bundling, in contrast to mixed bundling, only allows the sale of two goods together. In both the original contribution of Adams and Yellen and that of McAfee et al. the consumer surplus and profit implications of using both pure and mixed bundling are analysed within a model of two goods. This assumes a uniform distribution of reservation prices describing the valuations of consumers for these goods. On the other hand, Schmalensee [4] analyses pure bundling assuming that the reservation values of consumers are described by the bivariate normal distribution.
Bundling can be used to extract consumers' surplus. It is profit enhancing because its reduces the effective dispersion of the reservation prices. 1 In Schmalensee's paper bundling is based on the conventional approach of assigning equal weights to the goods. However, if symmetry (i.e. σ 1 σ 2 ) does not hold, pure bundling is less likely to be profit or welfare enhancing. This is used as an argument for mixed bundling. But as our paper shows, lack of symmetry may not be a problem if the weights of pure bundling can be optimized by the firm.
Moreover, it should be stressed that the manipulation of the mean valuation is as important as that of the dispersion. This is largely neglected in the literature, which mainly focuses on dispersion reduction alone, as an instrument for extracting consumers' surplus. In our paper we study a change in the spread of valuations which is not mean preserving.
This paper is a continuation of our research on the theory of bundling in the context of alternative forms of transactions by firms (see Dassiou et al. [5,6], Dassiou and Glycopantis [7][8][9]). In Dassiou and Glycopantis [7,8], we have already examined the use of mixed bundling as a mechanism for price discrimination leading to an enhancement of profits. It can also lead to an increase in trade if practiced by a price setting monopsonist. In common with Adams and Yellen, we used the assumption of a uniform distribution to describe the valuations of consumers for the goods.
In this present paper we consider pure bundling. We assume that the consumers' valuations follow the Gaussian distribution. We generalise Schmalensee's approach, and show that the weights of the two goods in the bundle can be chosen optimally. This can resolve the problem of a lack of symmetry in standard deviations without having to resort to a mixed bundling approach. Weight manipulation may ameliorate the desire to reduce dispersion. It may also lead to an increase in the weight of a good which despite its relatively higher dispersion has a substantially higher mean valuation.
An additional motive behind bundling is the desire by the firm to deal with risk aversion. This has been considered by De Graba [10] who introduces the idea of price discrimination but without bundling. If the seller can not observe the buyers' valuations, then a larger, "group" purchaser is a more risky source of profits than a small, "single unit" buyer. Hence the firm will be willing to sacrifice some profit in order to increase the probability of making a sale to a large purchaser.
Additionally, Dana [11][12][13] draws from the theory of yield (revenue) management. He argues that price dispersion occurs because of stochasticity, and that firms set different prices to smooth out demand and reduce uncertainty. The work of Dana is discussed in some length in Gaggero and Piga [14] within the context of pricing strategies pursued by airlines.
From the work of Schmalensee we know that dispersion reduction leads to higher profits through better extraction from the consumer surplus. In contrast, in the works of Dana and De Graba we see an alternative interpretation to the existence of price differentials, as a "defensive mechanism" for responding to uncertainty, rather than an "aggressive mechanism" for extracting consumers' surplus. The desire to reduce dispersion for profit enhancement may be either tempered or strengthened by the desire to increase the average demand for the bundle.
What is missing in the bundling literature is an integrative framework which combines both mechanisms to endogenously determine an optimal bundling strategy. To achieve this we set up a two-step decision process. First we formulate a utility function of the firm which depends on absolute risk aversion and then calculate the weights. This deals with the "defensive" aspect of bundling by using a portfolio optimisation approach to determine the relative weights of the component goods. Then, in the second step, by inserting these optimal weights into the profit function we calculate the optimal bundle price. Our analysis has distinct merits as we consider explicitly risk aversion, dispersions and means of valuation of goods. We also define the conditions under which in our framework the optimal weights are equal (this case is referred to as 1:1 bundling henceforth).
We then compare our optimal pure bundling to Schmalensee's equal weights (1:1) case in terms of the profit surplus, consumer and welfare implications. In terms of policy implications, we also show that the desire to increase the bundle mean and/or deal with risk aversion may ultimately lead to the choice of such weights that the result is an increase rather than a decrease in dispersion. Hence this will also lead to a further increase in consumer surplus when this an increasing function of dispersion.
Section 2 looks at the unbundled sales paradigm and derives the comparative statics results for profits, consumer surplus and welfare with respect to the mean and the dispersion of the normal distribution which describes the tastes of consumers for each good. In Section 3 we set out a two step process of optimal pure bundling. The firm determines the optimal weights in the bundle and then the bundle price. Comparisons are made with Schmalensee's approach of fixed weights. In Section 4 we discuss our conclusions. In the mathematical appendix we provide mathematical proofs of results stated in the text.

Unbundled sales
We derive here comparative static results for the case of no bundling for each good separately. Schmalensee also looks at the non bundling case briefly. However he does not separate the effects of the mean, dispersion and cost, which in our case is essential for our later analysis of pure bundling.
The two separate demand functions when the two goods are sold unbundled can be written as Q 1 (P 1 ) and Q 2 (P 2 ), i.e. their demands are independent of the price of the other good. Either demand function can be written as: where g(x) is the underlying distribution of the buyers' valuations and Q(P ) is the cumulative distribution. g(x) is assumed to be a univariate normal distribution with mean µ and standard deviation σ; it gives the density function of buyers. As it is standard in the literature we assume that each consumer, if he/she purchases a good, has a unit demand. Let f (t) be the standard normal density function, where f (t) = 1 √ 2π exp(− t 2 2 ), and define: where F is the cumulative distribution function. F is everywhere strictly increasing and it can be shown (see Schmalensee, p. S214) that: 8 Welfare Analysis and Policy Implications of Bundling Decisions by Firms The proofs of (3), (4) and (5) are straightforward. H(·) is the hazard function; the proof for (6) is obtained by integrating (5) by parts.
The demand for a single good 2 under unbundled sales is thus given by: Strictly speaking since the normal distribution allows negative values, and since valuations are typically non negative, some parameter restrictions should be applied when using the normal distribution to represent these valuations. It is simple to achieve this by assuming that µ σ > 3, in which case only positive values occur 99.93% of the time and negative valuations are thus virtually eliminated. Of course this inequality has implications both for the left as well as for the right hand side tail; valuations will fall within the region 0 < P < 2µ, 99.86% of the time.
While we do follow Schmalensee's analysis, we do not re-define variables by standardizing the mark up or the mean by subtracting costs and dividing by the standard deviation. This is because we wish to examine the impact on profits and on consumer surplus of a change in mean and a change in deviation separately.
For P > µ (µ > 0) the demand curve is strictly convex and for P < µ it is strictly concave (given that is positive if P > µ and negative if P < µ). In order to understand the relationship between the demand concavity or convexity and that of risk averse or risk loving behaviour respectively, we can think of the curvature of the demand curve as being measured by A = − Q (P ) Q (P ) = P −µ σ 2 . A < 0 is similar to the absolute risk aversion parameter applied to a utility function. Since the demand curve is downward sloping, A has the same sign as Q (P ), which is negative for P < µ.
Cowan [15] refers to the analogous nature of a curvature of the demand function and the concept of risk aversion. Kimball [16] further shows that the curvature of the slope of demand is analogous to the concept of relative prudence applied to utility functions. Moreover, the existence of risk aversion in our model would provide an additional source of concavity to that of Q (P ) < 0, as the utility function would be a monotonically increasing concave function of demand, which in turn is monotonically decreasing and concave. Therefore an absolute risk aversion measure α (α > 0) would exceed the absolute value of A in the former case. We will return to this discussion in Section 3.
The profit function also follows the normal distribution as it is a linear function of the demand function. It is given by where C is the constant unit cost of producing the good considered. The FOC for profit maximisation for the choice of P can be expressed as: while the SOC is: Substituting (6) into (8) we obtain: The above means that the SOC is always satisfied. The above inequality implies a range of values: The range of relevant values for a non negative profit can be reduced 3 to: For any pair of values of µ and σ the above condition is satisfied for P * in this interval. The inclusion of P * = µ in particular, means that the interval does include the origin of the standard normal distribution F . Hence the demand, and through it the profit function, are not globally concave.
From implicit differentiation of the FOC with respect to µ it is easy to show that Invoking relation (10) we obtain that 0 < dP * dµ < 1. It is also fairly easy to show that dP * dC = −σ 2 It is important to establish conditions for the sign of the change in optimum price with respect to the dispersion. This, together with the sign of relation (13), is used repeatedly below in calculating the various comparative static results. Using straightforward but tedious calculations we derive the theorem below.

2
< P * < µ, (a necessary condition for this inequality to hold is that µ > C; else P * > µ) the nominator in (14) is for this range of values a decreasing function of the optimal price and will turn from positive to negative. 4 . Hence dP * dσ increases from negative to positive values as the optimal price increases. If the price is significantly below the mean a higher dispersion reduces the optimal price.
We can now calculate the impact of µ and σ on the optimal profit function.
Using the envelope theorem we obtain: 3 As < C. 4 The value of (P * − C)(P * − µ) for this range of values is negative and an increasing function of P * (e.g. approaching zero as P * approaches µ) while the expression in this same range of values is negative and a decreasing func- Invoking the properties of the cumulative distribution function we obtain that 0 < dΠ * dµ < 1. Moreover, Relations (16) and (17) imply that Π * is an increasing, convex function of µ.
Similarly, employing the envelope theorem we obtain: It therefore follows that Π * is an increasing function of σ for P * > µ and a decreasing function for P * < µ. Hence decreasing dispersion in valuations is not always profit enhancing as commonly assumed.
We next examine the consumer surplus, It is straightforward to show that for P = P * , using (5) and the FOC, can be expressed as: Given (10), the consumer surplus is strictly positive for σ = 0. In other words, the existence of dispersion in demand gives the consumers the opportunity to capture some of the total surplus from the transactions in this good. We examine immediately below under what circumstances CS | P =P * is an increasing function of dispersion. It follows that for σ = 0, the value of P that maximises the welfare function differs from that of profit maximisation.
Note that through routine calculations we obtain: Therefore the consumer surplus is a strictly decreasing function of price. Splitting between a direct and an indirect effect, the impact of dispersion on consumer surplus at P = P * is equal to the additive expression: This means that the direct effect of dispersion on consumer surplus is always positive. For P * sufficiently smaller that µ, (in the way detailed in Theorem 1) the consumer surplus is an increasing function of σ always, as both effects are positive since ∂P * ∂σ < 0. Hence the indirect effect is positive and reinforces the direct effect. Denoting CS | P =P * by CS * , this means that CS * is an increasing function of σ for P * < µ.
The above property of CS * combined with the fact that Π * is a decreasing function of σ for P * < µ, means that the firm dislikes risk in the form of dispersion because the concavity of the demand function implies that capturing consumer surplus becomes more difficult as the dispersion in valuations increases. Therefore for A < 0 the concavity of the demand function elicits risk averse behaviour. A acts as a measure of the degree of this dispersion sensitivity. As we show below, the desire to manipulate dispersion may drive the firm into using bundling.
If, on the other hand, P * > µ, then the indirect effect is now negative as dP * dσ > 0 and hence the overall impact depends on which effect dominates over the other.
We now obtain comparative static results for the welfare function as a way of consolidating the overall impact of mean and dispersion on both the producer and the consumers: So, It is easy to check for the above that for P = P * , the terms that correspond to the profit component become zero which implies: In other words, a lower price than P * will give a higher welfare. Moreover: Advances in Economics and Business 1(1): 6-21, 2013 Using the FOC the above two expressions become: we also have that: So both W * and CS * are increasing functions of the mean valuation. Correspondingly, as dW dC Using (19) and (14), this becomes: The derivation of the above expression is given in the mathematical appendix. It shows that for − 2 welfare is an increasing function of dispersion and a decreasing function otherwise (Schmalensee, p. S217).
Hence if the profit maximising price is within C − √ 2σ, C + √ 2σ , welfare is an increasing function of dispersion. Setting P * −C σ = √ 2 in the FOC and using the normal distribution frequency tables we derive that 147. This results to µ−C σ = 1.561. Hence we can conclude that for P * ∈ (µ − 0.147σ, µ + 0.147σ) welfare is a decreasing function of dispersion. This corresponds to a range of values which covers almost 14% of the distribution.
In the range µ − 0.147σ < P * < µ, dΠ * dσ < 0 µ−C σ > 1.561 and as we have that dW * dσ < 0 P * −C σ > √ 2 , both profits and welfare are decreasing functions of dispersion. This means that a reduction in dispersion will, in this case, be beneficial to society as a whole, not just the company. If In other words, the desire of a profit maximising firm to decrease dispersion will damage both consumer surplus as well as overall welfare. 5 The above comparative static results are important for the bundling analysis and conclusions below.

Pure Optimal Bundling
We analyse here how the firm determines its portfolio of goods in the bundle. We derive comparative static results for the optimal composition of goods in the bundle. On the basis of this choice the firm then chooses the optimal bundle price. The comparative statics results derived in Section 2 are used in comparing the consumer and welfare implications of our results to those of the fixed weights approach by Schmalensee.
There is only limited literature on the topic of bundle composition manipulation by a monopoly. A recent exemption is the paper by Crampes and Hollander [17] referring to pay-TV distributors deciding on the composition of programme content, such as how many channels should be devoted to each type of programming (sports, documentaries, movies, news and current affairs, etc.).
An original idea in their paper is the introduction of a weights determination process. For example, how many type 1 (Good 1) and how many type 2 (Good 2) channels should be in the bundle, as well as the offering of alternative bundles where some have full coverage and some partial coverage. However the focus of their paper is different. There is no optimal calculation of the optimal bundle; instead they study when it is optimal to offer two alternative bundles as opposed to a single bundle.

12
Welfare Analysis and Policy Implications of Bundling Decisions by Firms Anam and Chiang [18] show that third degree price discrimination ameliorates the problem of risk by allowing the risk averse company to diversify its risk between different goods. They show that the conventional direction of third degree price discrimination may be reversed because a good with more elastic demand, but more risk, may be charged a higher price. Expected profit is sacrificed if this more than compensated for by the corresponding decrease in profit risk exposure.
Eckel and Smith [19] in a model of price discrimination use a convex cost function of expected quantities. By manipulating the contribution of each demand group on aggregate demand (through setting different prices for different goods) the firm can influence the mean and variance, and through it, the expected cost.
We also employ a portfolio optimisation approach. Unlike Eckel and Smith, we do not restrict our portfolio to mean preserving combinations. Portfolio manipulation may lead to a bundle combination of the two goods in a way that increases the bundle mean (µ B > µ 1 + µ 2 ). It has already been shown in Section 2 that a higher mean is both profit and consumer enhancing. In the subsection below we show that if weights are determined through portfolio optimisation, it is possible for pure bundling to take a form such that both the mean as well as dispersion are increased relative to conventional pure bundling. This ameliorates (or even eliminates) policy concerns that bundling may be damaging to the interests of consumers because in this case the dispersion in the valuations of the consumers may in some instances end up to be larger than that of separate sales σ 1 + σ 2 .
In our analysis we adopt the following approach. With respect to determining the optimal composition in the bundle we assume risk aversion from the point of view of the firm. The unique optimal allocation supports the observable result that the firm offers one bundle, rather than two bundles, for example each half of the time. Having calculated the optimal bundle composition we proceed to the calculation of the optimal profits. The focus is now the size of the profits as such. This is what is reported in the annual profits of companies and this is the criterion by which comparison of the performance of the firms is made. Namely having incorporated risk aversion in the construction of the optimal bundle, which is an internal calculation, the firm is then risk neutral in the calculation of the bundle price which will maximise profits.

Constructing the bundle
We start our analysis with a packaged good for which we assume, in a way analogous to the previous section, that each consumer if he/she purchases the package has a one unit demand for it. A unit of the packaged good consists of a given total number of individual units of Good 1 and Good 2. These goods have a similar dimension, for example the time involved in consuming them. Although throughout the one unit of the composite good is fixed, the number of units of each constituent good in the bundle can vary. The relative weights of the two goods in the package can be anything between 0 and 1 and the cost of such participation in the bundle is equal to the relative weight of the good times the cost of producing a package that consists only of this good. Means and standard deviations are also perfectly divisible into fractions of less than one.
For example imagine a package in mobile phone services that both offers text messages and phone calls (all measured in minutes) for a given monthly fee (bundle price). If the bundle consists of say, 300 phone calls and 600 text messages, this means that one third of the bundle is phone calls and the other two thirds is text messages. Hence 1 package has 900 minutes.
The decision that the monopolist has to make is first to optimally divide this package into texts and phone calls, and second to price it. We set as λ the relative weight of Good 1 (phone calls, λ = 1 3 ) and as 1 − λ the relative weight of Good 2 (texts) in the package. In terms of the Crampes and Hollander paper, one may think in similar terms in setting a TV bundle composition between movies and sport channels. The decision that each consumer then has to make is whether to buy this package of one unit of the composite good, i.e. the 900 minutes with the relative weights and price offered by the monopolist.
As another example one can think of Cadbury's Roses chocolates package which contains mini chocolates of various types (black, milk, white, with nuts, etc.). 1 box has a content of 400g. The decision of how many chocolates (in grams) of each type to include in the package is made by the company. The purchaser of the assortment will eat some types of chocolate, but can discard some others for which he does not have a positive valuation. Apart from chocolates, most of the examples one can think of this type of bundling belong to the family of information goods (e.g., goods that can be digitized) where the marginal costs are low and constant.
As we have already mentioned, by manipulating the relative weights we can affect dispersion and the level of demand. Given bivariate normality, any linear combination of the buyers' pairs of reservation prices will also follow the normal distribution. The underlying distribution for the buyers' valuations for the two goods x (valuation of Good) and y (valuation of Good 2) is and ρ is the correlation coefficient of the reservation valuations.
The valuations x and y are in money form, and so is the valuation of the bundle xλ + y(1 − λ). We now assume µ x , µ y > 0 and that the firm will assign weights in order to maximise a utility function which depends on the consumers' valuations of the bundle. The firm will then calculate the bundle price 6 . This is clarified below.
Given the underlying (bivariate normal) distribution of the valuations, the moment generating function for f (x, y) is given by M X,Y (t 1 , t 2 ): The firm's objective before it sets the bundle price P B , is to maximise its expected utility, given the bivariate normal distribution f (x, y) of the buyers' reservation prices (valuations). Optimisation is split into two stages, as the firm first decides whether it is worthwhile to bundle in the first place (hence endogenizes the bundling decision as set out in Theorem 2 below) and if so at what relative weights. This is determined by the valuations distribution characteristics (mean, variance and correlation) as well as the degree of risk aversion by the firm. We will use the term α to express risk aversion; this will reflect the firm's sensitivity to dispersion and other risk aversion factors. We assume that α is constant and positive, as in the case of constant absolute risk aversion. The coefficient α corresponds to the normalization k = 2λ, which we adopt as this will convenience our calculations. Hence, This utility function of the firm depends on the valuations of the bundle by all consumers. Obviously, in determining the bundle the firm will also take into account the costs of producing the good in the bundle and this is done below. Using the moment generating function, with t 1 = −αk and t 2 = −α(2 − k) we can re-write the optimisation function as Applying a monotonic transformation to the above the problem simplifies into: The combination of the parameters above is such that K is positive. The firm wishes to create an optimal bundle, where the weights of the two participating goods are such that the net return to (the expected utility from) the bundle is maximised given the valuations by the consumers and the production costs. Adjusting the means so that they are set as net from their corresponding costs, the firm maximises with respect to relative weight λ the following revenue certainty equivalent function: 6 If P B is the bundle price then the demand function can be written as: The firm's utility function has a constant absolute risk aversion functional form − exp[−α(xk + y(2 − k)]. If α = 0 then the utility function is of the linear form xk + y(2 − k) (Kreps, 1990, p.85). Hence, expected value maximisation will be of the form The above expression indicates that in this case the firm has zero dispersion sensitivity, corresponding to the case of a risk neutral firm. So for maximising the firm will need to assign all the weight to either good 1 or good 2 depending on which of the two goods has the highest mean (or net mean). Hence, the relative weights (λ, 1 − λ) will be either (0, 1) or (1, 0) and there will be no bundling for different means. 8 There is a further interesting case to consider which will result in an equal weights outcome: the firm's utility function has a functional form k(2 − k)xy with ρ = 0, which denotes an expected value maximisation for: Maximising the above expression with respect to λ since µ 1 , µ 2 > 0 gives relative weights (λ, 1 − λ) = ( 1 2 , 1 2 ) which corresponds to Schmalensee's type of conventional bundling. Hence this combination is optimal when ρ = 0 and the utility function of the firm is multiplicative. In the analysis that follows, we will derive the specific circumstances under which this combination is optimal within our framework.
, good j will receive a zero weight within the bundle. This implies either the deletion of an entire product line from the bundle or, less drastically, that the firm needs to consider the use of mixed bundling. Hence the decision whether to bundle or not, as well as how to balance the goods within the bundle, are both endogenous decisions.
Obviously for ρ > 0, a necessary (though not sufficient) requirement for a strictly positive share of the good with the lower mean net of cost, say j, is that the dispersion of tastes in good j multiplied with the correlation coefficient is smaller that the dispersion for other good, i.e. σ i > ρσ j . This is obviously increasingly binding as ρ increases. On the other hand, if ρ < 0 this is no longer a requirement.
From Theorem 2 it follows that: , good i has a greater share in the bundle than good j (k * > 1).
(ii) The Schmalensee format of equal weights becomes optimal when (µ , good j has a greater share in the bundle than good i (k * < 1).
We use relations (32) describing the optimal split of the two unit bundle. Suppose the weights of both goods are strictly positive. Then, given the common negative denominator, if µ i − C i > µ j − C j , in order for i to be given a greater weight than j we require that: . This completes the proof of (i). If the good which is superior (higher) in the net mean valuation is also inferior (higher) in the terms of its dispersion, and the difference in the net means is exactly offset by the differences in the variances multiplied by the risk aversion parameter, then each good will receive an equal weight. This proves part (ii), which shows that the Schmalensee format is a special case of our approach.
Part (iii) of the corollary means that the good with the higher net mean in consumer valuations will be assigned a lower weight if the difference in the net means is less than the differences in the variances multiplied by the risk aversion parameter. The proof of this follows from reversing the last inequalities above.
Part (i) in Corollary 1 will always be satisfied if σ i < σ j , as then the LHS of the above inequality is negative. Clearly that is so because in this case good i is superior to good j both in terms of the net of cost mean, as well as in terms of the variance criterion.
However good i may still be given a greater weight than j in the bundle even when σ i > σ j . Hence pure bundling in our model is not always dispersion reducing, as the latter is not always optimal. If the difference between the net means on the RHS of the inequality discussed in Corollary 1 is sufficiently large to exceed the LHS, then good i will have a greater weight in the bundle than j despite the fact that σ i > σ j . This may cause the dispersion of the bundle to exceed the sum of the dispersion of the two stand alone goods. We shall return to this point latter.
Hence combining Theorem 2 and part (i) of Corollary 1 we obtain the following result. For an optimal bundling decision such that the good with the higher net mean is also given a larger but strictly less than one (λ < 1) relative weight, the condition is Going back to the case where σ i < σ j , it is still possible for good j to feature in the bundle as long as σ i > ρσ j which will be always (more easily) satisfied in the case of a negative (low positive) value of ρ. Hence low correlation between the two goods makes bundling more desirable. While this is a result shared with the Schmalensee paper, given that pure bundling lowers profits the closer ρ is to 1, the weight of each good here is determined, among other things, by the correlation coefficient. We show this more formally below, by first differentiating k * with respect to ρ.
As already mentioned, the decision whether to bundle in the first place is endogenous, while the weights of the goods in the bundle are also endogenously determined by their means, dispersion, the degree of correlation, and finally the degree of risk aversion as set out in relations (35). We derive below the comparative static results on the impact of each of these parameters.
The relation immediately above implies that the weight of the good with the higher net mean, µ i − C i , is a decreasing function of risk aversion, α. Also, the higher the degree of risk aversion, the lower will be the absolute value of its impact on that weight.
This means that the existence of risk aversion is encouraging bundling by enhancing the contribution in the bundle of the good with the lower net mean all other things being equal.
Implicitly differentiating (31) with respect to ρ for given means, costs, variances and α gives: Hence for k * > 1, dk * dρ > 0, while for k * < 1, dk * dρ < 0. Hence a lower ρ boosts the share of the good with the lower relative weight in the bundle and improves its odds of having a non zero participation in the bundle. Consequently, low positive correlation values promote bundling, and even more so negative values of ρ.
By implicitly differentiating (31) with respect to the dispersion, for given means, costs, α, ρ, dσ 2 = 0 we obtain: This means that the share of a good in the bundle is inversely related to its own dispersion. It follows that: This means that the share of a good in the bundle is directly related to the dispersion of the other good. Implicitly differentiating (31) for given costs and variances, α and ρ, setting dµ 2 = 0 and dividing by dµ 1 gives Hence, the optimal weight of each good is directly related to its mean and, as can be easily shown, inversely related to the mean of the other good.
Finally, implicitly differentiating (31) for given means, variances, α and ρ, setting dC 2 = 0 and dividing by dC 1 we obtain: It follows that the optimal weight of each good in the bundle is inversely related to its cost, and, as can easily be shown, directly related to the cost of the other good in the bundle. 11 The proof is as follows. Assume that the net mean of good 1 is greater than the net mean of good 2. Distinguish two cases: (i) For k * ≥ 1, σ 1 ≥ ρσ 2 (else 2 − k * < 0). (ii) For k * < 1, σ 1 ≥ σ 2 . Hence in both cases ρσ 2 − σ 1 < 0.

Derivation of the profit maximising price
Having derived the optimal weights for the two goods, we now proceed to derive the profit maximising bundle price. The firm will offer the bundle at the specific price as a take it or leave it option (pure bundling).
The mean and the cost of the bundle are respectively defined as 12 µ B = kµ 1 + (2 − k)µ 2 , C B = kC 1 + (2 − k)C 2 . The standard deviation of the bundle can be written as follows: where ρ is the correlation coefficient of the joint reservation distribution, and is the share of Good 1 in the weighted sum of dispersions. Hence, σ B = δ(kσ 1 + (2 − k)σ 2 ), where: Since 0 δ 1, it follows that σ B < kσ 1 + (2 − k)σ 2 . The value of δ is minimised for any given ρ if θ = 1 2 so that the proportion of the two goods in the bundle is set as σ1 σ2 = 2−k k . Such weight-setting is clearly not possible in the Schmalensee paper. A value of θ = 1 2 requires that kσ 1 + (2 − k)σ 2 < σ 1 + σ 2 . In other words, the good with the higher dispersion will receive the lower weight in the pack so that the weights of each good's contribution to the weighted sum of the dispersions are equal.
Relations (32) imply that a good will receive a higher relative weight in the bundle if it has a net mean that more than makes up for a relatively higher dispersion, as seen in part (i) of Corollary 1. This means that k * σ 1 + (2 − k * )σ 2 > σ 1 + σ 2 and δ k * > δ k=2−k . Hence in this case σ B , as defined in relation (38), is for k = k * larger than what it is for k = 2 − k = 1. (This is further discussed as Case 3 in the next subsection.) In fact, it is possible that σ B may even be larger than σ 1 + σ 2 , if k * σ 1 + (2 − k * )σ 2 and δ k * are sufficiently larger than σ 1 + σ 2 and δ k=2−k respectively.
Weights optimisation alters the bundle mean, µ B , in relation to the fixed weights case. It also alters σ B in two ways: by affecting δ as well as the weighted sum of the two dispersions. Using (39) and inserting k * from (32) we obtain the optimal share of Good 1 in dispersions: The above implies that θ * in our framework is a function of ρ; differentiating with respect to ρ we find that θ * is an an increasing (decreasing) function of ρ when σ 2 > σ 1 (σ 2 < σ 1 ). If the net means of the two goods are equal, then we have θ * = σ2−ρσ1 (1−ρ)(σ1+σ2) , which for as long as σ 2 > ρσ 1 guarantees a positive weight for good 1. We now proceed into defining the profit functions for the composite good (bundle), using the definition in relation (7). Π * B,k * is the optimal profit when the weights are k * and 2 − k * , i.e.: Equivalently Π * B,k=2−k is the optimal profit when weights are set at 1:1, as for example in the Schmalensee . Using relation (16) in Section 2, it is easy to show that the derivative of the optimal profit with respect to the mean is: ∂µ B,k * . The analysis of the changes in the profit and welfare functions is markedly different and rather more satisfactory than the analysis of the calculations when constraints are imposed on the mean valuation and cost of the bundle to move in the same proportion and in opposite directions (as in Dassiou and Glycopantis [2012]). By allowing separate and independent changes of the bundle mean and costs we capture the full effect of changes in profit and welfare. The derivative with respect to the bundle variance is:

Consumer and welfare implications of optimal bundling
For the welfare function, , the corresponding relations are: (derived as relation (46) in the mathematical appendix) is shown to be such that In other words, the absolute value of the rate of change in welfare with respect to the variance is smaller to the rate of change in welfare with respect to the mean.
In order to analyse the consumer and welfare implications of our pure bundling approach and make comparisons with that of Schmalensee approach we consider three cases.
. The firm faces no dilemma as the good with the larger mean also has the lower dispersion. Hence it assigns a larger weight to this good.
In this case it is relatively rare that our optimal packaging approach is welfare inferior to that of the conventional approach. It occurs only when the ratio of the absolute value of the change in the bundle variance to the change in the net bundle mean exceeds the fraction of the marginal rate of change of welfare with respect to the mean divided by the marginal rate of change with respect to the variance, provided that the latter is positive.
. This means that for µ i −C i > µ j −C j and σ i < σ j , we have that . Clearly in this case as profit is an increasing function of the net mean and a decreasing function of the dispersion, we have that Π * B,k * Π * B,k=2−k . Using the small increments formula we can define an approximate equality for the difference between the welfare in our model minus the welfare in Schmalensee's, as ∆W B,k * that will allows us to determine the sign of the change in welfare of using our model instead of that of equal weights: 13 as the difference in the net bundle mean of our model minus that of Schmalensee's pure bundle net mean. We also identify ∆ var = σ 2 B,k * − σ 2 B,k=2−k as the difference in the variance of the bundle in our model minus the bundle variance in Schmalensee's model. To ensure the accuracy of the approximation in the case of large changes in net means and variances, we multiply by ν, 0 ≤ ν ≤ , where is arbitrarily small.
Considering the impact on welfare using (29) if (P * B,k * − C B,k * ) 2 − 2σ 2 B,k * > 0, we derive that This means that a decrease in the bundle dispersion increases welfare, which will be further reinforced by the increase in the bundle mean. Hence in this case our bundling is superior to Schmalensee's both in terms of the profits as well as the total welfare. On the other hand, if (P * B,k * − C B,k * ) 2 − 2σ 2 B,k * < 0 14 , then The rise in the net mean would still increase welfare, but this will have to be compared with the reduction that a decrease in the bundle dispersion will cause. Hence, in this case it is ambiguous whether the reduction in welfare is smaller or larger than in the Schmalensee case. In other words, while the dispersion is further reduced, the bundle mean is increased. The latter can more or less than offset the negative impact on welfare of a reduction in dispersion. We attempt to clarify this ambiguity below. We have that, and The derivation of expressions (44) and (45) is rather involved and it is given in the appendix. Given the conditions within Case 1, ∆ var < 0, ∆ nm > 0 and k * > 2 − k * . Using (45) it is easy to show Lemma 1. 13 The small increments formula in (43) gives in effect: We call the left-hand side ∆W B,k * since we are interested in changes in welfare resulting from using bundling as determined in our case relative to welfare in Schmalensee's case, and for this calculation we want to use the specific derivatives we have calculated for the optimal welfare function.
> 1 + 1 α . Lemma 1 will be used in assessing whether welfare has been increased or decreased relative to the Schmalensee case. As

Conclusions
This paper discusses a model of pure bundling where a monopolistic firm does not restrict itself to assuming that the weights of the two goods are fixed. Therefore it is a far more realistic reflection of bundling in practice, compared with the conventional approach. We contrast the implications of an optimal choice of weights in terms of profit, consumer and welfare outcomes with those of a conventional model of pure bundling which assumes equal weights 1:1.
The decision whether to bundle or not is endogenous, and so are the weights chosen. By choosing the weights through an optimisation process, bundling is no longer mean preserving. Hence, both the means (net of costs) as well as the dispersions of the consumers' valuations of the two goods are important in determining the weights within the bundle. Additionally, this optimisation introduces the impact of risk aversion into the bundling decision.
We consider three different cases of the profit and welfare implications of our bundling approach. These depend on the values of the net means and dispersions, and the degree of risk aversion.
In Case 1 the inequalities in the net means and the dispersions of the two goods are in opposite directions. Our findings show that the firm is more sharply focused in reducing dispersion through bundling, which may reduce welfare. The adverse impact on the latter is however tempered by the fact that it also increases the net bundle mean. Hence typically in this case our optimal bundling will inflict a lower damage on welfare than the one using the Schmalensee approach.
In Case 2 the inequalities in the net means and the dispersions are in the same direction and the difference in the net means is smaller than the difference in the variances times the degree of risk aversion. This will induce the firm to offer a bundle such that the good that has both a higher net mean as well as a higher dispersion will be assigned a lower weight. This means that the firm will bundle in a way that is both mean and variance reducing. This leads to adverse effects in terms of consumer surplus as well as welfare.
In Case 3 the inequalities in the net means and the dispersions are in the same direction; however this time the difference in the net means is larger than the difference in the variances times the degree of risk aversion. The firm will then bundle in a way that is both mean and variance increasing, leading to beneficial effects both in terms of the consumer surplus as well as welfare. The choice of optimal weights in the bundle is such that, the standard deviation σ B | k * =2−k * > σ B | k=2−k=1 , as the good with the higher dispersion is assigned a larger weight. Such a weight may even lead to a case where σ B | k * =2−k * > σ 1 + σ 2 . This means that pure optimal bundling will in this case lead to an increase rather than a decrease in the dispersion not only in relation to conventional bundling, but also in relation to the case of no bundling. Obviously such a typically welfare enhancing case was not a possibility of pure bundling in the Schmalensee paper.

Mathematical Appendix
Below we provide proofs of certain expressions and show that

Explanation of Expression (29):
Substituting relations (14) and (21) into the second term on the right hand side expression in (22), and adding (19) to (22) we get: Explanation of Expression (44): Replacing k * by its value we obtain: Explanation of Expression (45):