2.3 Bayes rule, conditioning and the Gibbs sampler
2.4 Application to normal means
Like some of the innovations studied as the basis for interrelating these separate works, occasionally there arises an innovation that is so rich and powerful that it seems surprising that it is not more frequently exploited. In modern mathematical statisticsand in the history of mathematical statistics prior to Bayes theoremthere probably has not been an innovation that has had such a lasting impact on the way scientists view problems in empirical science and the stock of methods collected for solving these problems. The advent of Markov chain Monte Carlo (MCMC) methods is one such innovation. The purpose of this chapter is to introduce briefly MCMC to the reader in a user-friendly manner.
Although there was a significant lag before Markov chain Monte Carlo (MCMC) methods became noticed by the professionthe original papers dating to Metropolis and Hastings in the 1950sMCMC has had a famous impact since the seminal statistical contributions appeared (Gelfand and Smith 1990). Since the early 1990s there have been significant advances in the biological sciences, the humanities, and particularly the medical sciences. However, the economics and agricultural economics sciences have been less fervent in their acceptance of these new techniques. Whether this state of affairs arises due to a disproportionately fewer number of Bayesians working in these fields is open to debate, but this situation is clearly changing. Moreover, the advent of MCMC has stimulated considerable new entry into applied Bayesian statistics.
This collection of papers is related by its application of MCMC to solving two important problems for economic development policy. These problems are the identification of the resources and their quantities that are necessary to effect entry among representative non-participating households at two sites close to Addis Ababa from which data were collected in the 1997 production year.
Broadly defined, the projects objectives were to identify the factors that precipitate the emergence of new milk markets when the presence of relatively high transactions costs were considered a major impediment. This objective is important. One of the major impediments to economic development throughout SSA is a lack of density of market participation (Stiglitz 1989). Over time, and with varying personnel involved, these objectives became more refined and focused on the end products that are contained in this collection.
But the final product can never be better than the quality of the original outlay, nor can it improve upon the necessary inputs in a new venture. The essential inputs into any empirical exercise are the data. The data used in these various projects are very rich and it is important to keep this in mind as one reads through the diverse set of applications contained in this compendium. The data are due, primarily, to the efforts laid out by Charles Nicholson in the 1997 survey period while he was a postdoctoral fellow at the International Livestock Research Institute (ILRI), supported by a grant from The Rockefeller Foundation. As well as showcasing some of the important contributions of MCMC, the works collected here showcase the quality of the data collected by an ILRI scientist.
To derive policy prescriptions for identifying opportunities for expanding market participation, the independent works collected here provide classic examples of the scientific value of innovation, specifically, the routine application of MCMC to quality data. We introduce MCMC to the reader through selecting the crucial component of all the models developed and expanded upon in this collection and demonstrate its operation with a limited available information. The crucial component in all the projects is the normal-linear model. Each of the models further developed here are simple extensions in the number of unknown quantities, in the forms of the distributions that they entail, and in the complexity of relationships between the components that the normal-linear model underlies. Hence, providing an example of the techniques operation in this context to satisfy the reader with improved sets of information (most of the empirical models here are based on data sets ranging in size between 204 to 1428 observations) can do much better. The success of these procedures is based on the crucial issue of convergence to the true distribution that the next section demonstrates in the simplest framework possible.
Consider the problem of locating the mean of a normal distribution, from data y ≡ (y1,y2,...,yN)' that are independent and identically distributed as Normal (μ, σ). This is a two-parameter problem in the unknown quantities θ≡(μ, σ)'. The conventional Bayesian approach to this problem is to set-up a prior probability distribution for the unknown quantities, π (θ), combine this prior with the observed data likelihood, ℓ(θ|y), and, through Bayes rule, derive the joint posterior distribution for the unknown quantities of interest, after observing the data
where the symbol 'µ means is proportional to. In other words, net of an unimportant constant (that simply scales the posterior probability density function so that it has mass equal to one and thereby constitutes a true density) the posterior measure on the left-hand side is a joint probability density function for the unknown quantities θ and is the target density in the exercise. This is simply Bayes theorem.
MCMC pertains to the analysis of the joint posterior density and, particularly, the derivation of the marginal probability distribution functions, π(μ|y) and π(σ|y), which are the end products of any Bayesian investigation.
When the functional form of π(θ|y) permits integration of each of the components of the joint density, marginal probability distributions can be derived easily. But when is intractable, meaning that the necessary integrations cannot be carried out in closed form, a number of numerical avenues opens for empirical analysis. One of these alternative approaches is MCMC and, two of its special cases—data augmentation and Gibbs sampling—provide the basis for all of the estimation conducted in this collection. Here, we restrict attention to the Gibbs sampler.
The Gibbs sampler becomes a candidate for evaluating the joint posterior distribution when each of the full, conditional distributions comprising the joint posterior have well-known forms that are easy to sample from.
This point is worth re-emphasising. The application of the Gibbs sampler requires two conditions to be met. First, we require that the marginal probability density functions for each of the component distributions are not available in closed form; without this condition there would be no need to make any numerical approximation to the posterior. Second, we require that each of the full conditional distributions comprising the joint posterior have well-known forms. Each of the problems that follow this chapter are linked in their fulfillment of these two criteria and, hence, we can apply the Gibbs sampler in order to derive inferences about unknown quantities of interest.
The problem of locating the mean among normal data is a problem with a tractable posterior for which no MCMC approximation is required, but, in view of its familiarity, it is useful for demonstration.
In the normal-data example, the component conditional distributions do have well known forms. A standard, non-informative prior, π(θ) µ σ–1, leads to a joint posterior which, in turn, results in component conditional distributions that have, respectively, normal and inverse-gamma forms. Precisely, the posterior distribution for the mean, μ, conditional on the standard deviation, σ, is normal and the posterior distribution for the standard deviation conditional on the mean has an inverted-gamma form. Under relatively mild regularity conditions (Gelfand and Smith 1990) that are satisfied by each of the models developed in this compendium, the draws that alternate in sequence between μ, conditional on , on the one hand; and conditional on μ, on the other, form a Markov chain with highly desirable convergence properties. Specifically, under the stated regularity conditions these chains converge to the target probability distributions that we seek.
It is essentially these observations, and generalisations of them that are employed repeatedly throughout this paper.
The interested reader is directed to some slightly stronger (perhaps, more persuasive) mathematical arguments in Casella and George (1993), and in Chib and Greenberg (1995). Below, we give two examples of a special case of the normal-linear model, which is used repeatedly as a basis for investigation in each of the discrete- and truncated-distribution problems arising in examining market participation.
Suppose that the data vector has length N = 10, and the mean and variance are, respectively zero and one, so that the data are independent and identically distributed standard normal. Figure 1 presents examples of convergence in distribution by presenting the results of the first 50 draws in the Gibbs sequence, when the sequence is given the highly unrealistic starting values (μ0, σ0) = (1000, 1000). Note that we only have 10 observations from which to draw inferences. However, the convergence in distribution is quite striking. The draws oscillate for the first few iterations until the target distributions are located and thereupon simulate draws from the target conditional distributions, namely a normal distribution, N(∑iyi /N,1) for the mean, with posterior mean equal to the data mean, and an inverted-gamma distribution for the variance parameter with parameters, v = N and s2 = ∑yi2/N.

Figure 1. Convergence in the Gibbs sample.
There are three features worth re-emphasising. First, we obtain convergence with very limited information; here we have only 10 sample observations. Second, we obtain convergence even with very unrealistic starting values. Third, convergence to the target densities is extremely rapid.
In the previous example, the marginal distributions of interest can be obtained exactly and hence, no Gibbs sampling is actually required to simulate a draw from the target density. Now, consider an identical set-up but with the addition that data are censored at the a priori unknown mean. In this case, due to the evaluation of integrals implied by the censoring, Gibbs sampling is required to simulate a draw from the joint posterior. However, the same early convergence in distributions emerges (Figure 2).

Figure 2. Convergence in the Gibbs sample.
This example is important, for three reasons. First, it re-emphasises the important point that convergence in distribution occurs quite quicklyeven in a limited information environment. Second, it confirms the assertion that the convergence in the standard normal-means model was not due in any way to the particular simplicity of that model. Third, the example shows, in perhaps the simplest setting, that the Gibbs sampling procedure works to good effect when the data in question are censored. This latter point is important when it is recognised that the bulk of the models visited in this collection contain censored data, in particular, censored observations on household marketable surplus.
In summary, these two examples are not intended to provide overwhelming evidence of the usefulness of the procedures encountered in the remainder of this collection, nor its power in small sample sizes. Rather, the exercises are intended to give a flavour of the power of the technique under relatively unfavourable sampling circumstances.
The normal-inverted-gamma form, it is worth stressing, is an obvious example to choose because it also provides the basis for all of the applications that follow. Most of these applications possess somewhat more complicated posterior forms. However, with the exception of only two of them, most of these forms appear frequently in the literature and have been studied, repeatedly, in a similar context to the examples just presented. If these simple examples are not persuasive, we hope that the many applications that we now visit will convince the reader of the power of the methodology in analysing important empirical problems with considerable policy importance.
The remainder of this work is organised as follows. Section 3 introduces the data used in the various applications, provides the motivation for their collection and presents summary of statistical reports. Section 4 applies a standard probit procedure to the binary participation data. Section 5 applies a single-equation Tobit procedure to the marketable surplus data. Sections 6 to 9 consider various extensions of the basic Tobit and probit set-ups. Section 6 considers the impact of production decisions; Section 7 considers a count-data problem in crossbred cow adoption; Section 8 introduces fixed costs; and Section 9 considers two-step participation and selling decisions. Conclusions are offered in Section10.