Bayesian Approaches to Testing a Point ("Nothing") Hypothesis

John Yard. Kruschke , in Doing Bayesian Data Assay (Second Edition), 2015

12.ane.ii.2 Why HDI and not equal-tailed interval?

I accept advocated using the HDI as the summary credible interval for the posterior distribution, also used in the decision rule along with a R OPE. The reason for using the HDI is that information technology is very intuitively meaningful: All the values inside the HDI take higher probability density (i.e., credibility) than whatever value outside the HDI. The HDI therefore includes the most credible values.

Some other authors and software use an equal-tailed interval (ETI) instead of an HDI. A 95% ETI has ii.5% of the distribution on either side of its limits. Information technology indicates the two.5th percentile and the 97.fifth percentile. One reason for using an ETI is that it is easy to compute.

In symmetric distributions, the ETI and HDI are the same, simply not in skewed distributions. Figure 12.two shows an example of a skewed distribution with its 95% HDI and 95% ETI marked. (Information technology is a gamma distribution, so its HDI and ETI are easily computed to loftier accuracy.) Notice on the right in that location is a region, marked past an arrow, that is outside the HDI only inside the ETI. On the left there is some other region marked past an arrow, that is inside the HDI merely outside the ETI. The ETI has the foreign property that parameter values in the region marked by the correct arrow are included in the ETI, even though they accept lower credibility than parameter values in the region marked by the left arrow that are excluded from the ETI. This holding seems undesirable every bit a summary of the credible values in a distribution.

Figure 12.2. A skewed distribution has dissimilar 95% highest density interval (HDI) than 95% equal-tailed interval (ETI).

The foreign belongings of the ETI also leads to weirdness when using it as a decision tool. If a nil value and R OPE were in the region marked by the correct pointer, it would be rejected by the HDI, but non past the ETI. Which determination makes more than sense? I recall the decision by HDI makes more sense, because it is maxim that the values outside its limits have low credibility. But the conclusion past ETI says that values in this region are not rejected, fifty-fifty though they have low credibility. The complementary disharmonize happens in the region marked by the left arrow. If a nix value and R OPE overlap that region, the decision past HDI would exist not to reject, but the conclusion past ETI would exist to reject. Once again, I think the decision by HDI makes more sense, because these values accept loftier credibility, even though they are in the extreme tail of the distribution.

Proponents of using the ETI point out that the ETI limits are invariant under nonlinear transformations of the parameter. The ETI limits of the transformed parameter are just the transformed limits of the original calibration. This is not the case for HDI limits (in full general). This belongings is handy when the parameters are arbitrarily scaled in abstract model derivations, or in some applied models for which parameters might be nonlinearly transformed for different purposes. Simply in about applications, the parameters are meaningfully divers on the canonical scale of the information, and the HDI has significant relative to that scale. Nevertheless, it is important to recognize that if the scale of the parameter is nonlinearly transformed, the HDI limits will alter relative to the percentiles of the distribution.

Read full affiliate

URL:

https://world wide web.sciencedirect.com/scientific discipline/article/pii/B978012405888000012X

Tools in the Torso

John K. Kruschke , in Doing Bayesian Information Analysis (Second Edition), 2015

25.2.2 HDI of unimodal distribution is shortest interval

The algorithms for computing the HDI of an MCMC sample or of a mathematical function rely on a crucial property: For a unimodal probability distribution on a single variable, the HDI of mass Grand is the narrowest possible interval of that mass. Figure 25.i illustrates why this is true. Consider the 90% HDI as shown. We construct another interval of 90% mass by moving the limits of the HDI to right, such that each limit ismoved to a indicate that covers iv%, as marked in gray in Figure 25.1. The new interval must likewise cover 90%, because the 4% lost on the left is replaced by the 4% gained on the right.

Figure 25.1. For a unimodal distribution, the HDI is the narrowest interval of that mass. This figure shows the 90% HDI and another interval that has 90% mass.

Consider the grey regions in Effigy 25.1. Their left edges take the same height, because the left edges are defined by the HDI. Their areas are the aforementioned, because, by definition, the areas are both 4%. Detect, however, that the left grey area is narrower than the right grey expanse, because the left area falls at a signal where the distribution is increasing, but the right expanse falls at a point where the distribution is decreasing. Consequently, the distance betwixt right edges of the ii grey zones must be greater than the HDI width. (The exact widths are marked in Figure 25.one.) This argument applies for whatever size of grey zone, going right or left of the HDI, and for whatever mass HDI. The argument relies on unimodality, however.

Given the argument and diagram in Figure 25.1, information technology is not also difficult to believe the converse: For a unimodal distribution on i variable, for whatever mass G,the interval containing mass Thou that has the narrowest width is the HDI for that mass. The algorithms described below are based on this property of HDIs. The algorithms find the HDI by searching among candidate intervals of mass Thousand. The shortest one found is declared to exist the HDI. It is an approximation, of course. See Chen and Shao (1999) for more details, and Chen, He, Shao, and Xu (2003) for dealing with the unusual situation of multimodal distributions.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780124058880000258

Nil Hypothesis Significance Testing

John K. Kruschke , in Doing Bayesian Data Analysis (Second Edition), 2015

xi.iii.2 Bayesian HDI

A concept in Bayesian inference, that is somewhat analogous to the NHST CI, is the HDI, which was introduced in Department 4.3.4, p. 87. The 95% HDI consists of those values of θ that accept at least some minimal level of posterior credibility, such that the total probability of all such θ values is 95%.

Allow's consider the HDI when we flip a coin and notice z = 7 and N = 24. Suppose we have a prior informed by the fact that the coin appears to be authentic, which nosotros express here, for illustrative purposes, equally a beta(θ|11,11) distribution. The right side of Figure 11.7 shows that the 95% HDI goes from θ = 0.254 to θ = 0.531. These limits span the 95% most credible values of the bias. Moreover, the posterior density shows exactly how credible each bias is. In particular, we can encounter that θ = 0.5 is within the 95% HDI. Rules for making discrete decisions are discussed in Chapter 12.

In that location are at least iii advantages of the HDI over an NHST CI. Get-go, the HDI has a direct interpretation in terms of the credibilities of values of θ. The HDI is explicitly about p(θ| D), which is exactly what we want to know. The NHST CI, on the other mitt, has no direct relationship with what nosotros want to know; in that location's no clear human relationship betwixt the probability of rejecting the value θ and the credibility of θ. 2nd, the HDI has no dependence on the sampling and testing intentions of the experimenter, because the likelihood function has no dependence on the sampling and testing intentions of the experimenter. 8 The NHST conviction interval, in contrast, tells us most probabilities of data relative to imaginary possibilities generated from the experimenter'southward intentions.

Third, the HDI is responsive to the analyst'due south prior beliefs, as it should exist. The Bayesian analysis indicates how much the new data should alter our beliefs. The prior beliefs are overt and publicly decided. The NHST analysis, on the opposite, does not comprise prior noesis.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780124058880000118

What is This Stuff Called Probability?

John K. Kruschke , in Doing Bayesian Data Analysis (Second Edition), 2015

4.3.iv Highest density interval (HDI)

Some other fashion of summarizing a distribution, which we will employ often, is the highest density interval, abbreviated HDI. 6 The HDI indicates which points of a distribution are near credible, and which cover most of the distribution. Thus, the HDI summarizes the distribution by specifying an interval that spans most of the distribution, say 95% of it, such that every point inside the interval has college brownie than whatever point outside the interval.

Effigy iv.5 shows examples of HDIs. The upper panel shows a normal distribution with hateful of aught and standard deviation of one. Considering this normal distribution is symmetric around nada, the 95% HDI extends from −1.96 to +1.96. The area under the bend betwixt these limits, and shaded in greyness in Figure iv.five, has surface area of 0.95. Moreover, the probability density of any x within those limits has higher probability density than any ten outside those limits.

Figure iv.v. Examples of 95% highest density intervals (HDIs). For each example, all the x values inside the interval have higher density than any x value outside the interval, and the total mass of the points inside the interval is 95%. The 95% area is shaded, and it includes the zone below the horizontal pointer. The horizontal arrow indicates the width of the 95% HDI, with its ends annotated by (rounded) x values. The meridian of the horizontal arrow marks the minimal density exceeded past all 10 values within the 95% HDI.

The heart panel of Figure 4.5 shows a 95% HDI for a skewed distribution. By definition, the surface area nether the curve between the 95% HDI limits, shaded in gray in the effigy, has area of 0.95, and the probability density of any x within those limits is higher than any ten outside those limits. Importantly, discover that the area in the left tail, less than the left HDI limit, is larger than the area in right tail, greater than the right HDI limit. In other words, the HDI does not necessarily produce equal-expanse tails outside the HDI. (For those of you who have previously encountered the idea of equal-tailed credible intervals, you tin can look ahead to Figure 12.2, p. 342, for an caption of how HDIs differ from equal-tailed intervals.)

The lower panel of Figure 4.v shows a fanciful bimodal probability density function. In many realistic applications, multimodal distributions such as this do non arise, but this example is useful for clarifying the definition of an HDI. In this instance, the HDI is split into ii subintervals, one for each way of the distribution. All the same, the defining characteristics are the same every bit before: The region under the curve within the 95% HDI limits, shaded in grey in the effigy, has full area of 0.95, and whatever x within those limits has higher probability density than any x outside those limits.

The formal definition of an HDI is just a mathematical expression of the two essential characteristics. The 95% HDI includes all those values of x for which the density is at least as large as some value W, such that the integral over all those 10 values is 95%. Formally, the values of x in the 95% HDI are those such that p(x) > W where W satisfies x : p(x) > W d10 p(ten) = 0.95.

When the distribution refers to brownie of values, then the width of the HDI is another way of measuring incertitude of beliefs. If the HDI is broad, then beliefs are uncertain. If the HDI is narrow, then beliefs are relatively sure. Equally will be discussed at length in Chapter 13, sometimes the goal of enquiry is to obtain data that reach a reasonably high degree of certainty well-nigh a item parameter value. The desired caste of certainty tin can be measured as the width of the 95% HDI. For example, if μ is a measure out of how much a drug decreases blood pressure, the researcher may want to have an estimate with a 95% HDI width no larger than 5 units on the blood pressure level calibration. As another example, if θ is a measure of a population's preference for candidate A over candidate B, the researcher may want to take an judge with a 95% HDI width no larger than x percentage points.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124058880000040

Goals, Ability, and Sample Size

John K. Kruschke , in Doing Bayesian Data Assay (Second Edition), 2015

13.one.1 Goals and obstacles

There are many possible goals for an experimental or observational report. For example, nosotros might desire to show that the rate of recovery for patients who have a drug is higher than the charge per unit of recovery for patients who take a placebo. This goal involves showing that a nada value (zero difference) is not tenable. We might desire to ostend a specific effect size predicted past a quantitative theory, such equally the curvature of light around a massive object predicted by general relativity. This goal involves showing that a specific value is tenable. We might want merely to measure accurately whatsoever outcome is present, for example when measuring voter preferences in a political poll. This goal involves establishing a minimal degree of precision.

Any goal of research can exist formally expressed in diverse ways. In this chapter I will focus on the following goals formalized in terms of the highest density interval (HDI):

Goal: R eject a nix value of a parameter.

-

Formal expression: Show that a region of practical equivalence ( R OPE) around the cypher value excludes the posterior 95% HDI.

Goal: Affirm a predicted value of a parameter.

-

Formal expression: Show that a R OPE around the predicted value includes the posterior 95% HDI.

Goal: Achieve precision in the estimate of a parameter.

-

Formal expression: Show that the posterior 95% HDI has width less than a specified maximum.

In that location are other mathematical formalizations of the various goals, and they will be mentioned later. This affiliate focuses on the HDI considering of its natural interpretation for purposes of parameter estimation and measurement of precision.

If nosotros knew the benefits of achieving our goal, and the costs of pursuing information technology, and if we knew the penalties for making a mistake while interpreting the data, and then nosotros could limited the results of the inquiry in terms of the long-run expected payoff. When nosotros know the costs and benefits, we can conduct a full determination-theoretic treatment of the state of affairs, and plan the research and data interpretation accordingly (e.m., Chaloner & Verdinelli, 1995; Lindley, 1997). In our applications we do non accept access to those costs and benefits, unfortunately. Therefore nosotros rely on goals such as those outlined in a higher place.

The crucial obstacle to the goals of inquiry is that a random sample is only a probabilistic representation of the population from which it came. Even if a money is actually off-white, a random sample of flips volition rarely show exactly 50% heads. And even if a coin is not fair, information technology might come heads 5 times in ten flips. Drugs that actually piece of work no better than a placebo might happen to cure more than patients in a particular random sample. And drugs that truly are effective might happen to testify little difference from a placebo in another particular random sample of patients. Thus, a random sample is a fickle indicator of the true state of the underlying world. Whether the goal is showing that a suspected value is or isn't credible, or achieving a desired degree of precision, random variation is the researcher'south bane. Noise is the nemesis.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780124058880000131

Metric Predicted Variable with Multiple Metric Predictors

John K. Kruschke , in Doing Bayesian Data Analysis (2nd Edition), 2015

18.1.5 Informative priors, sparse data, and correlated predictors

The examples in this book tend to apply mildly informed priors (e.g., using data about the crude magnitude and range of the data). But a benefit of Bayesian analysis is the potential for cumulative scientific progress past using priors that take been informed from previous inquiry.

Informed priors can be especially useful when the amount of data is modest compared to the parameter infinite. A strongly informed prior essentially reduces the scope of the credible parameter space, so that a pocket-sized amount of new information implies a narrow zone of apparent parameter values. For example, suppose nosotros flip a coin in one case and detect a caput. If the prior distribution on the underlying probability of heads is vague, and so the single datum leaves us with a wide, uncertain posterior distribution. But suppose nosotros have prior knowledge that the coin is manufactured past a toy visitor that creates play a trick on coins that either always come up upward heads or always come up tails. This knowledge constitutes a strongly informed prior distribution on the underlying probability of heads, with a spike of 50% mass at zero (e'er tails) and a spike of 50% mass at one (always heads). With this stiff prior, the single datum yields a posterior distribution with consummate certainty: 100% mass at i (always heads).

As another case of using potent prior information with sparse data, recall the linear regression of weight on height for xxx people in Figure 17.three (p. 481). The marginal posterior distribution on the slope has a mode of nigh 4.v and a fairly broad 95% HDI that extends from about 2.0 to seven.0. Furthermore, the articulation posterior distribution on the slope and intercept shows a strong trade-off, illustrated in the scatter plot of the MCMC concatenation in Effigy 17.iii. For example, if the slope is about 1.0, then apparent intercepts would have to exist about +100, but if the slope is about eight.0, then apparent intercepts would accept to be about −400. At present, suppose that we have strong prior knowledge about the intercept, namely, that a person who has zero height has nothing weight. This "knowledge" might seem to be a logical truism, merely really it does non make much sense considering the example is referring to adults, none of whom have nix height. Only we will ignore reality for this illustration and suppose that we know the intercept must be at zippo. From the trade-off in apparent intercepts and slopes, an intercept of zero implies that the slope must be very nearly 2.0. Thus, instead of a wide posterior distribution on the slopes that is centered near 4.5, the strong prior on the intercept implies a very narrow posterior distribution on the slopes that is centered near ii.0.

In the context of multiple linear regression, sparse information can lead to usefully precise posteriors on regression coefficients if some of the regression coefficients accept informed priors and the predictors are strongly correlated. To understand this idea, it is important to call back that when predictors are correlated, their regression coefficients are besides (anti-)correlated. For instance, recall the SAT information from Figure 18.3 (p. 514) in which spending-per-pupil and percentage-taking-the-test are correlated. Consequently, the posterior estimates of the regression coefficients had a negative correlation, equally shown in Effigy 18.5 (p. 518). The correlation of credible regression coefficients implies that a potent conventionalities most the value of one regression coefficient constrains the value of the other coefficient. Expect carefully at the scatter plot of the 2 slopes shown in Figure 18.5. Information technology can be seen that if we believe that the slope on per centum-taking-the-exam is −iii.ii, then credible values of the slope on spending-per-educatee must be around xv, with an HDI extending roughly from ten to 20. Observe that this HDI is smaller than the marginal HDI on spending-per-educatee, which goes from roughly four to 21. Thus, constraining the possibilities of one gradient besides constrains apparent values of the other slope, considering estimates of the 2 slopes are correlated.

Figure 18.3. The information (Guber, 1999) are plotted as dots, and the filigree shows the best fitting aeroplane. "SATT" is the average total SAT score in a country. "%Take" is the pct of students in the country who took the SAT. "Spend" is the spending per educatee, in thousands of dollars.

That influence of one slope estimate on another can be used for inferential advantage when we have prior knowledge virtually i of the slopes. If some previous or auxiliary research informs the prior of i regression coefficient, that constraint can propagate to the estimates of regression coefficients on other predictors that are correlated with the first. This is especially useful when the sample size is small, and a only mildly informed prior would not yield a very precise posterior. Of course, the informed prior on the starting time coefficient must be cogently justified. This might not exist easy, especially in the context of multiple linear regression, where the inclusion of additional predictors tin can profoundly change the estimates of the regression coefficients when the predictors are correlated. A robustness check also may be useful, to evidence how potent the prior must be to describe potent conclusions. If the information used for the prior is compelling, so this technique tin be very useful for leveraging novel implications from small samples. An accessible discussion and case from political scientific discipline is provided past Western and Jackman (1994), and a mathematical give-and-take is provided by Learner (1978, p. 175+).

Read full affiliate

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780124058880000180

What's in This Book (Read This Offset!)

John K. Kruschke , in Doing Bayesian Information Analysis (Second Edition), 2015

1.3 What'due south new in the second edition?

The bones progression of topics remains the same as the first edition, just all the details take been changed, from comprehend to cover. The book and its programs have been completely rewritten. Here are just a few highlights of the changes:

At that place are all new programs in JAGS and Stan. The new programs are designed to be much easier to use than the scripts in the showtime edition. In particular, there are now compact high-level scripts that make information technology easy to run the programs on your ain data. This new programming was a major undertaking past itself.

The introductory Chapter two, regarding the basic ideas of how Bayesian inference reallocates brownie across possibilities, is completely rewritten and greatly expanded.

At that place are completely new chapters on the programming languages R (Affiliate 3), JAGS (Chapter viii), and Stan (Chapter 14). The lengthy new affiliate on R includes explanations of data files and structures such every bit lists and data frames, along with several utility functions. (Information technology as well has a new poem that I am particularly pleased with.) The new chapter on JAGS includes explanation of the RunJAGS package which executes JAGS on parallel estimator cores. The new chapter on Stan provides a novel explanation of the concepts of Hamiltonian Monte Carlo. The chapter on Stan also explains conceptual differences in program flow between Stan and JAGS.

Chapter 5 on Bayes' rule is greatly revised, with a new emphasis on how Bayes' rule re-allocates credibility across parameter values from prior to posterior. The fabric on model comparing has been removed from all the early chapters and integrated into a meaty presentation in Affiliate 10.

What were two separate chapters on the City algorithm and Gibbs sampling have been consolidated into a unmarried affiliate on MCMC methods (as Chapter 7).

At that place is extensive new material on MCMC convergence diagnostics in Capacity seven and 8 . In that location are explanations of autocorrelation and effective sample size. There is also exploration of the stability of the estimates of the highest density interval (HDI) limits. New calculator programs display the diagnostics, as well.

Chapter 9 on hierarchical models includes extensive new and unique material on the crucial concept of shrinkage, along with new examples.

All the fabric on model comparison, which was spread across diverse chapters in the first edition, in now consolidated into a single focused chapter (Chapter 10) that emphasizes its conceptualization as a case of hierarchical modeling.

Chapter eleven on zip hypothesis significance testing is extensively revised. It has new material for introducing the concept of sampling distribution. It has new illustrations of sampling distributions for various stopping rules, and for multiple tests.

Affiliate 12, regarding Bayesian approaches to null value cess, has new material about the region of applied equivalence (ROPE), new examples of accepting the nil value by Bayes factors, and new explanation of the Bayes factor in terms of the Brutal-Dickey method.

Chapter xiii, regarding statistical power and sample size, has an extensive new section on sequential testing, and recommends making the research goal be precision of estimation instead of rejecting or accepting a particular value.

Chapter xv, which introduces the generalized linear model, is fully revised, with more than complete tables showing combinations of predicted and predictor variable types.

Chapter 16, regarding estimation of means, now includes extensive discussion of comparing two groups, along with explicit estimation of event size.

Chapter 17, regarding regression on a single metric predictor, now includes extensive examples of robust regression in JAGS and Stan. New examples of hierarchical regression, including quadratic trend, graphically illustrate shrinkage in estimates of individual slopes and curvatures. The apply of weighted data is also illustrated.

Chapter 18, on multiple linear regression, includes a new department on Bayesian variable option, in which diverse candidate predictors are probabilistically included in the regression model.

Chapter nineteen, on 1-factor ANOVA-like assay, has all new examples, including a completely worked out example analogous to analysis of covariance (ANCOVA), and a new example involving heterogeneous variances.

Chapter 20, on multi-factor ANOVA-like assay, has all new examples, including a completely worked out instance of a split-plot design that involves a combination of a inside-subjects gene and a between-subjects factor.

Affiliate 21, on logistic regression, is expanded to include examples of robust logistic regression, and examples with nominal predictors.

At that place is a completely new chapter (Chapter 22) on multinomial logistic regression. This chapter fills in a case of the generalized linear model (namely, a nominal predicted variable) that was missing from the first edition.

Chapter 23, regarding ordinal information, is greatly expanded. New examples illustrate unmarried-group and ii-group analyses, and demonstrate how interpretations differ from treating ordinal data as if they were metric.

At that place is a new department (25.4) that explains how to model censored data in JAGS.

Many exercises are new or revised.

Oh, and did I mention that the cover is unlike? The correspondence of the doggies to Bayes' rule is now made explicit: The folded ears of the posterior doggie are a compromise betwixt the perky ears and floppy ears of the likelihood and prior doggies. The marginal likelihood is non unremarkably computed in MCMC methods, so the doggie in the denominator gets sleepy with nothing much to practise. I hope that what's between the covers of this book is every bit friendly and engaging equally the doggies on the cover.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780124058880000015