Planning, Criticism, and Revision

SUMMARY This paper presents a more complete theory of data analysis which allows for changes in the state of mind of the observer and also for approximations that limit planning costs. Discussion is included on the form that criticism should take, and the extent to which planned responses to the data can legitimately be revised after the data are reviewed. The proper role of diagnostics is discussed. Some diagnostic statistics are genuinely criticisms, but many are pre-test diagnostics that play a role in a complex multi-step method of estimation. A third category is elicitation diagnostics, which ask data-dependent questions about the prior distribution.

E. E. LEAMER Hendry, and Starr (1987) claim that 'precisely how one should construct primarily a matter of research efficiency; in principle, no method of co invalid since nothing precludes an investigator from thinking of or chan robust relationships prior to or during data analysis'.Leamer (1985, p. seems like 'a combination of backward and forward stepwise (better kn regression'.Bruce Hill's (1980, 1988) position seems not far from Hendry's.Hill (1988), echoing Keynes (1921, Ch. 25), observes that from the Bayesian perspective 'once a model has been formulated, whether pre-or post-data, the likelihood function for the parameters of that model, conditional upon the truth of that model, does not in any way depend upon the circumstances under which that model was discovered' (Hill, 1988, p. 11).The Bayesian problem with data-instigated models, which was recognized by Keynes (1921, Ch. 25), is only how to form a prior distribution that is not 'contaminated' by the data.Hill (1988, p. 13) acknowledges that 'the difficulties are primarily psychological', but apologizes that 'the force of a Bayesian analysis of data must depend upon an agreement among scientists that specific prior distributions and likelihood functions are pertinent to the problem, and can be considered on their own merits, even after the data has [sic] been observed'.
In apparent contrast to Hill and Hendry, I believe that the use of diagnostic statistics does present a challenge to statistical theory, classical or Bayesian.Traditional statistical theory deals with the evaluation of planned responses to hypothetical data sets.Indeed it is impossible to compute sampling properties without a set of plans indicating the response to the data for every conceivable data set.The use of a diagnostic statistic to criticize a model is an advance announcement that the planned responses are not fully committed and may be revised when the actual data are observed.
However, very few if any of the diagnostics that are traditionally employed i econometrics literature are criticisms in my sense of precipitating an unplanned, unpredicta response to the data.Many are 'pre-test' diagnostics that play a part in a complex multi method of estimation of a very general model.A statistic is a pre-test diagnostic if bot general model and the response to the data can be fully defined and programmed befo data are observed.The proper evaluation of pre-test diagnostics involves either the stu the sampling properties of these complex procedures or the search for a prior distribu that could partially justify them.
But not all responses can or should be planned.The actual response to real data can d from the planned response to hypothetical data for at least two reasons.The first reason is t the desired response to the data depends on the state of mind of the observer, which can cha with changes in mood and expertise.Second, even if there were no variability in the st mind, a complete set of plans applicable to every conceivable data set is very cost formulate.Plans accordingly will be formulated only for data sets that are regarded probable, and responses to improbable data sets will be formulated only if and when th observed.Contrast for example the scatter of observations in Figures 1 and 2. Though statistics and the R2 values are the same, the messages seem very different, and the p regressing y on x seems not very wise for the scatter in Figure 2. One would not sensibly ha planned for this possiblity since it seems so remote, but once these data are observed original plan to run a regression seems highly inappropriate, and cries out for revision This paper presents a more complete theory of data analysis which allows for changes in th state of mind of the observer, and also for approximations that limit the planning c Discussion is included on the form that criticism should take, and the extent to which plann responses can legitimately be revised.The proper role of diagnostics is discussed.I argue 1. Pre-test diagnostics which select between a pair of alternative estimates.2. Elicitation diagnostics which indicate if the inferences are sensitive to the he choice of prior distribution, and which call for a more accurate measurement of the state of mind.
3. Criticisms which suggest a 'fundamental' change in the model and/or prior distributions.
It is possible to have a theory of pre-set diagnostics and a theory of elicitation diagnostics, but I know of no approach that allows a theory of criticism.Selecting the form that genuine criticism should take is by its very nature an unsolvable problem, since the solution must be based on the corrective action that a successful criticism should precipitate, yet knowledge of that action means that criticism in my sense has not taken place.Furthermore, in the absence of a completely appropriate theory of criticism, the adjustments to the inferences that are required to correct for both successful and unsuccessful criticisms must remain to some extent ad hoc, though I will argue that there are adjustments that are sensible.

CRITICISM AND REVISION
Essentially all of statistical theory is concerned with the evaluation of planned responses to hypothetical data sets.A Bayesian approach allows these idealized plans to depend on the state of mind of the observer, but even the Bayesian theory ignores the possibility that this state of mind responds to influences other than previous data sets.Actual responses, Bayesian or not, may be quite different from these idealized plans, because the state of mind of the observer may change substantially for reasons not adequately captured in the Bayesian model.The model of data analysis that is presented here allows for changes in the state of mind of the observer.This model of data analysis is broad enough to include 'exploratory' data analysis as well as 'confirmatory' data analysis.Confirmatory data analysis is characterized by a substantial commitment to the original planned responses.Exploratory data analysis has weak plans, if any, and may use displays and diagnostics to suggest the 'model' on which a response might be formulated.The problem with exploratory data analysis is that there is a substantial tendency to overfit, to see patterns in the data set that are not really there.A theory of exploratory data analysis indicates how the overfitting problem can be avoided.
Incidentally, the word 'response' is here used in a general sense.A response to a data set is sometimes a decision (for example: Act as if a hypothesis were true) but more often is a judgement (for example: The data suggests that the value of 3 is close to zero).The response to the data depends on the state of mind of the observer.You and I may see the same observations but draw very different conclusions from them.For that matter, you may analys a data set today and draw very different conclusions than you did last week.
The three determinants of the state of mind are former observations, 'mood' and 'expertise'.
Bayesians have a well-developed theory of how past observations affect the prior distribution, which can be important for interpreting the current data set.The classical problem of pooling different data sets yields essentially the same results.But neither the traditional Bayesian theory nor the classical pooling theory admits the influence of variable factors other than data.
The two non-data components of the state of mind that are considered here are the 'mood', which is defined as those random effects that are impermanent and stationary, and the 'expertise', which is defined to be those random effects that are permanent and nonstationary.Emotions, among other things, can cause changes in mood.Reasoning and flashes of insight can cause changes in expertise.Social interactions are a very important source of changes in mood and expertise.Fads influence mood; fashions may be either permanent or impermanent.
Reference to the problem of estimating/discovery of the functional form of a relationship has helped to organize my thoughts about these difficult issues.It seems useful to distinguish six different ways in which a functional form is selected: Planned responses: 1. Stepwise estimation in which a quadratic term is included if its t-value exceeds some critical level.

S8
This content downloaded from 108. 3. Visual inspection of the scatter of points to decide which functional form to estimate (linear, quadratic, log-linear, etc.).
Unplanned Responses: 4. Visual inspection of the scatter of points without the expressed intent of considering other than a linear form, but the discovery of evidence of curvature that leads to the estimation of a nonlinear function.
5. Discovery of curvature in the scatter of points that stimulates a theoretical insight and al the level of 'expertise' to such an extent that plans in the future allow for this kind of curvature.
6. Use of the t-statistic on the quadratic term as a diagnostic statistic to suggest an unspecif alteration of the model such as the inclusion of some new variable.
The line separating (3) from (4) separates settings in which a planned response to the data is carried out from settings in which the original plans are weak and major revisions occur.The former methods will be said to be 'above the line'; the latter are 'below the line'.The former are 'confirmatory'; the latter 'exploratory'.1 The former methods ought to be subjected to the critical scrutiny of traditional statistical theory; the latter may (or may not) be free from that form of scrutiny.'Diagnostic' statistics could be said to be used in all six cases.In the first three cases the diagnostics statistics are part of a multi-step method of estimation, and the overall method of estimation should be evaluated in the traditional manner.But when a diagnostic statistic is used to stimulate an unpredictable response, the estimation method falls 'below the line' and resists evaluation of any kind.
More generally, the analysis of data is summarized in the following data: 2The phenomena of planning, criticism, and revision have analogies in computer programmin human learning.A computer program is designed to work well for normal inputs, and will signal diagnostic message when inputs are unusual and difficult to process.Then the user may want to finding or building a more suitable program.Contracts are usually written contingent on expected unexpected events occur, either party can appeal to the court to have the contract rewritten.The of unemployment may be linked to conditions that are unusual, but not so unusual that the contrac humans learn new tasks, the discretionary activities of criticism and revision are frequent but, w become 'programmed' and automatic.Instincts are genetic programs that can be overridden as circ planning of responses occurs 'above the line'.A response may be either a judgement or an action; a response may be either precise or imprecise.'Confusion' is the state of mind that is present when a judgement is imprecise; 'indecision' is the outcome of an imprecise action.'Confusion' and 'indecision' are important subjects that have received inadequate attention in the theory of inference, exceptions including Leamer (1987).My concern here, however, is with the separation between the activity of planning from the activities of criticism and revision.
It should be clearly understood that both classical and Bayesian inference require complete commitment to the initial plans, and disallow criticism and revision.Classical inference, which refers to sampling properties, requires a complete commitment to the initial plans since sampling properties can be computed only if the response to every conceivable data set is known.A Bayesian treatment also implicitly requires a complete commitment to the initial plans in the sense that the plans are a consequence of the choices of prior and sampling distributions, which choices are made after the data are observed only with discomfort to the data analyst and suspicion by the reader of results.
By its very nature we cannot know the form that criticism should take, but it is clear that both successful and unsuccessful criticisms have implications for drawing conclusions from the data.The phenomenon of criticism, even when it does not lead to a revision, reveals that there is a lack of complete commitment to the assumptions that underlay the original plans.This lack of complete commitment requires some alteration of the plans; for example, the standard errors of the coefficients should be enlarged to reflect the fact that there surely are omitted variables that cause bias in the estimates.When the criticism is successful there is a double-counting problem, because the data are used once to alter the assumptions, and then again to estimate the parameters, as if these were the assumptions that were used from the beginning.
However, the distinction between planned and unplanned responses is not obvious.Responses that are predictable but not explicitly and consciously planned can be said to be implicit plans.Genuine revisions are unpredictable and quite rare.For example, I don't know in advance exactly how a multidimensional stepwise regression program will work in particular settings, but I know it will always do the same thin and in that sense is predictable.This computer program will be regarded to embody a complete plan, even though many aspects of the plan do not have my conscious review.
Human intervention is necessary, but not sufficient to establish that a revision has occurred.It depends on whether the intervention is predictable.For example, my detection of curvature in a scatter of points is predictable, and simulates the predictable response of including a parameter that allows curvature.In this case the human and the computer combine to carry out the implicit plan.This is essentially the same as a stepwise program that adds a nonlinear term when it is 'significant'.

ELEMENTS OF A THEORY OF DATA ANALYSIS
The model of data analysis that is presented here will use the following notat Yt= a sequence of data matrices; St= current state of mind; Mt = the mood, a stationary random process; Et= expertise, a nonstationary random process.The data are assumed to have been drawn from a distribution that depends on ft, the par of interest, and )t, a vector of 'nuisance' parameters: Fy(Yt I, t) This notation allows for time-series dependence in the sampling process if the vector /t includes other data sets such as Yt-1.This assumption of the existence of a data distribution is essentially vacuous if there is sufficient freedom to select the nuisance parameter.
Prior Distribution The nuisance parameter dt is assumed to be infinite dimensional to allow for essentially any assumption about the way the data are generated.In order to make inferences about ft, the values that dt can take on must somehow be limited.Often this is done in practice by restricting all but a few of the components of jt to take on preselected values.This can be regarded as a special kind of prior distribution which is dogmatic about some of the components of .tand diffuse about the others.For purposes of discussion it will be assumed more generally that there exists an implicit or explicit prior distribution that indicates the probable regions for (ft, <t):

G(ft,Mt I S) (ft, kt) (Et
This prior distribution depends importantly on the state of mind, S. I will not assume that this distribution can be elicited without error, and it may be impossible to base a data analysis on the 'true' prior distribution.

The State of Mind
The state of mind depends on past observations, on the mood M and on the expertise E of the observer: The Mood The mood is a stationary stochastic process which for purposes of discussion is assumed to depend only on the current observations and a white noise random variable Ct: The mood may vary with the personal emotional state of the observer and may also be influenced by social interactions (fads and fashions).The mood may be very different for analysis of hypothetical data sets than for analysis of real data sets, since the latter are treated with greater thought and care.The formal elicitation of a prior distribution can alter substantially the mood of the observer.This can increase the amount of care but also cause a high level of commitment to the current model; more on this below.

The Expertise
The level of expertise is a nonstationary stochastic process which for purposes of discussion will where et is a white noise random process.Expertise changes with contemplation, study, enlightenment, and training, among other things.The process is nonstationary, since once a level of expertise is obtained there is no tendency to return to the former level; indeed the process may be irreversible.

The Idealized Response
Given the state of mind of an observer, there is an idealized response to the data.This idealized response can be found using either a Bayesian or a classical approach, although the solutions may differ: The traditional theory of data analysis is almost completely a theory of ideal, planned, fully committed responses, not a theory of actual responses.Plans by definition are formulated before the real data are observed.A planned response to hypothetical data will differ from the actual response to real data for at least two reasons.The first is that, at the time the plan is formulated, the future state of mind is uncertain and can be forecast only with error.Secondly, even if there were no variability in the state of mind, a complete set of plans applicable to every conceivable data set is very costly to formulate.For example, it may be infinitely costly to elicit the prior distribution fully and without error.Plans accordingly will be formulated only for data sets that are regarded to be probable.Responses to improbable data sets will be formulted only if and when these improbable data are observed.If a plan is applicable for a range of possible data sets, then it will be said to be a 'wide' plan.A plan will be wide in a setting in which there is a great deal of knowledge about the process that generates the data; that is to say when there is little variability in the state of mind.
The planned and actual responses that are made can only approximate the ideal response function.The sense in which the response approximates the ideal is most easily discussed from the Bayesian perspective which can base a data analysis on an approximate prior distribution.This Bayesian perspective will now be used, and a sampling theory treatment will be presented subsequently.

Approximate Prior Distribution
Using the Bayesian approach, the formulation of the idealized response function would require the elicitation of the prior distribution over the infinite dimensional parameter space cLt, a task that would require an unlimited amount of time.Instead, an approximate prior distribution is formulated.First the parameter space 4Žt is abbreviated, and then a mathematically convenient approximate prior distribution is formed over the abbreviated parameter space: G(,Bt,\t I S) ( 3t, t) E(t (S) An equivalent characterization of this approximate prior distribution uses the device of an approximate state of mind: where S is a state of mind similar to S but one that implies an abbreviated parameter space and a simple data analysis.The planned response is selected before the current data are observed.A Bayesian plan is formulated based on a prediction of the state of mind: where St is a prediction of St given the information available in period t -1.This prediction is selected to imply a relatively simple data analysis.Note that if mood and expertise are stochastic and affect the state of mind, then the plan is also stochastic in the sense that the same response is not always made to the same data set.The function w(St) is the 'width of the plan'.If the approximate state of mind St is a known function of the data Yt, then the width of the plan can be characterized in terms of 'the' probability that the plan will be carried out.If this probability is evaluated with respect to the approximate prior distribution, it is likely to underestimate the true probability of a revision.More of this below in the discussion of the choice of significance level for diagnostic statistics.

SPECIAL CASES
The following are special cases:

Textbook Classical Theory
Most programs for electronic data analysis cannot alter themselves, and when confronted with the same inputs always produce the same outputs.The absence of memory and randomness means that a computer program cannot have a variable state of mind.In that event the width of the plan is infinite in the sense that the actual response and the planned response are necessarily the same and equal to the idealized response for one special state of mind So: The textbook classical model of data analysis ignores altogether the fact that a human has to write the computer program and select the inputs.A real data analysis must therefore be viewed as the Output of a dual effort by human and electronic computer.No sharp distinction should be made between responses that are carried out completely by electronic computers, and responses that are partly selected by a human computer.For example, I might use stepwise regression to decide if a function is quadratic or not: if the t-value on the quadratic term exceeds some critical value, then the electronic computer will include the quadratic; otherwise, it will be excluded.This is not fundamentally different from deciding to include a quadratic term if something looks 'suspicious' in the scatter of observations (or plot of the residuals).
Given that a human being must be involved in a data analysis, the closest one could come to the ideal classical model is to have the human write the computer program and select the inputs into the electronic computer without reviewing the current data and without influence of the stochastic elements in mood and expertise.An equivalent would be a computer program with memory.In terms of the model, this amounts to selecting a predicted state of mind that does not depend on the mood or the expertise and a plan with an infinite width: Here again, the planned response and the actual response are identical.

Classical Model with Stochastically Selected Inputs
In practice it is unlikely that a human could approximate an electronic computer and make choices that do not depend at all on mood and expertise.For example, stepwise estimation of a quadratic equation will always produce the same estimated model, linear or quadratic, when excited by the same data set.But a human observer of a scatter of observations will sometimes include the quadratic term, but sometimes will not.This makes the response have a random component.It is as if the stepwise computer program were to use a stochastic critical value to determine if the quadratic term should be included or excluded.
In terms of the elements of the model, this requires only that we allow the past levels of mood and expertise to affect the response function: widths, when in fact the widths are infinite.For example, stepwise regression is an example of confirmatory data analysis that can be written in a form which makes it appear to have an exploratory component: where F is the F-statistic for testing if the variables X2t belong in the equation.One might be tempted to say that the planned response to the data is to include all the variables in the regression, but if the data are 'unusual' in the sense that the F statistic is small, then the plan is revised and only a subset of variables is included.
In this case of stepwise regression, however, the planned response to the data set is complete and fully carried out.Stepwise regression should be viewed as a form of confirmatory data analysis, and subjected to the same kind of critical scrutiny as other confirmatory analyses.
Either sampling properties should be determined, or the implicit prior distribution should be unearthed.If the Bayesian approach is taken, it is natural to assume that the implicit prior distribution summarizes the notion that the subset of variables X2t might be neglected because they have coefficients close to zero.

Apparently Exploratory Data Analysis with Implicit ex-post Plans
The cost of planning can be completely avoided if the width of the plan is genuinely set to zero.
In that event a respons e formulated only for the actual data set once it is observed, not for all hypothetical data sets.The distinction between exploratory and confirmatory data analysis might be based on the width of the plan.A fully confirmatory data analysis occurs when the width is infinite.An exploratory data analysis might be said to occur when the width of the plan is zero.3But the proper distinction between exploratory and confirmatory data analysis cannot be made on the basis of the apparent width of the plan only, since plans may be implicit yet still e said to exist and subject, in principle, to the same kind of scrutiny as explicit plans.
Consider again the example in which one looks a a scatter of points to decide what functional form should be estimated, not necessarily committed to choosing between the linear or quadratic forms.Is this genuine exploratory data analysis?Not if the subject is merely carrying out an implicit plan.In principle we could find out what the plan is by confronting the subject with a sequence of hypothetical scatters of observations, and asking if the quadratic term should be included.This is analogous in inputting a sequence of data sets into a stepwise regression program to see if the program selects or omits the quadratic term.One difference is that there is variability in response of the human that is not normally present in the computer.
This makes the plan stochastic.Another big difference is that the response of a human to hypothetical data sets may be very unlike the response to real data sets.Genuinely exploratory data analysis occurs when the approximate state of mind on whic the data analysis depends is a function of the current data.This would occur either becaus the mood and expertise depend on the current data, or because the approximation to the current state of mind depends on the current data.

w( )=O
In this case the response function R(Yt) is not equal to an idealized response p (Yt I S) for state of mind S because the data Y affect the state of mind S on which the response is ba The basic problem with exploratory data analysis is that the data play two roles in the analysi one to determine the state of mind (instigate a hypothesis) and the other to select a resp given this state of mind.The solution to this problem of double counting is proper polic of the inferences to make the response function conform as closely as possible to an idea response for some state of mind.More on this below.
From the standpoint of an outside observer, who can see the response but not the logic it, it is impossible to distinguish exploratory from confirmatory data analysis with imp plans.A clear distinction could be made if the plan were required to be fully articulated befor the data were observed.It is also possible to test if the response to hypothetical data sets is th same as the response to real data sets.Complications would arise, however, when there changes in expertise.

Ideal Bayesian Model
Bayesian programs have prior information as inputs.These define the state of mind, whic a function of past observations, and time, in the sense that the prior for analysing Yt depends on past observations and on the process that is observed at time t.Thus: meaning that the true state of mind depends on the sequence of past observations and process that is currently being analysed, and this state of mind can be perfectly measured.
Bayesian model is like the nonstochastic classical model but with a well-developed theo the state of mind.
Here is an example: suppose that a sequence of vectors is observed that are generated regression functions: where the last assumption indicates the relationship between the coefficients of interest and past coefficients.Then we can substitute to obtain Yi = Xit + ei ei -N(O, a21 + Xi Vit Xi) The posterior mean of fi can then be written in the notation of the state of mind as where S-l s2t is the prior mean vector and Sit is the prior precision matrix.These compone of the state of mind are dependent on past observations according to: Practical Bayesian Model: Precommitted Prior Distribution In practice, however, there is a substantial amount of whimsy in defining the prior distributio which anyway is only an approximation to the state of mind.A model of practical Bayesi analysis in which the prior distribution is selected before reviewing the data is thus: where m is a measurement function.Here the response depends o expected state of mind.The accuracy of the measurement of thi depends on the mood and expertise of the observer.

Bayesian Analysis with an Implicit, ex post Plan
The initial prior distribution that was formed before the data w undesirable once the data are observed.The selection of a prior dist of the data could be based on any of three assumptions regarding the e on the state of mind and its measurement.These three cases are: 1. Neither the state of mind not its measurement depends on the current data.2. The state of mind does not depend, but its measurement does depend, 3. The state of mind, and consequently its measurement, depend on th The second two cases lead to what I would call 'exploratory data analysi exploratory.An example of the first case is offered in the subsequent diagnostics, in which I suggest eliciting the prior distribution carefully o answer is given to the question: 'Is the prior variance greater than c?' w selected after the data are observed in such a way that an affirmative an approximation that the prior variance is infinite.I argue that the answer to t likely to be (greatly) affected by the fact that c depends on the data, and con not give rise to the double-counting problem of exploratory data analysi This case in which the prior distribution is elicited after the data are (substantially) dependent on the data takes the same form as apparentl analysis from the classical perspective: R(Yt) = p(Yt I St) St = S(Yt-1, Yt-2, ..., Y1, t, et).

w( )=O
Note that here the response function is an ideal response for a state of mind that is me after the data are observed.This is not what I call exploratory analysis, even though the of the plan is zero.

Exploratory Bayesian Analysis
The other two possibilities in which the measurement of the state of mind depends o current data fall under the heading of exploratory data analysis because the response fu is not an idealized response for any state of mind.The first model of exploratory data an has the true state of mind independent of the current observations, but has the measure of some influence of the current observations: R(Yt) = p(Yt I )

w( )=o
The other model has the state of mind as well as its measurement dependent on the cur ...,Y,et,et).

w( )=0
These two cases allow the data to play two roles in the analysis; once to affect prior distribution and again to affect the inferences given the measured prior distr kind of adjustment to the inferences is required to correct for the possibilit counting, and to limit the chance of overfitting (seeing patterns in a random Mood and expertise cause great problems for classical inference which atte choice of response function on its sampling properties.If the response func ordinary regression, then its sampling properties such as bias and variance mathematically.If the response function is more complex, like stepwise programmable on an electronic computer, the function can be plotted an properties can be established by Monte Carlo methods.If a human, subjec and accumulation of expertise, shares the choice of response with an electr may also be possible to determine the joint response function and to esti properties using Monte Carlo experiments in which the human-cum-electr excited by a sequence of randomly chosen inputs and the correspond tabulated.Randomness in the human response is clearly allowable with th approach if there is no intersample dependence in the response.If there dependence, the experimental approach may uncover it, but sampling prop clearly defined.
Consider, for example, the problem of estimating the mean /mi of a seq populations each with variance one.The following are three estimators su Bayesian tradition.Each is a weighted average of the sample means mi and which can be thought to be the location of the prior distribution.
The location of the prior for the first estimator is always zero.The location of the pr the second estimator is a stationary random variable, suggestive of changes in mood location of the prior for the third estimation is a nonstationary random variable, suggestiv changes in expertise.What are the sampling properties of these estimators?We can a that the first estimator has mean w/i and variance w2/n where n is the sample size.The se estimator might be said to have mean w,/i and variance w2/n + (1 -w)2.Or it could b that, conditional on ei, the mean is w/,i + (1 -w)ei and the variance is w2/n.The question t must be confronted is whether changes in mood should be embedded in the sampling e not.This question is made more pointed by reference to the third estimator which m said to have mean w,ui and variance w2/n + (1 -w)2ni where ni is the number of mean have been observed, or alternatively could be said to have mean wmi + (1 -w) Ey < variance w2/n.Yet a third alternative is to condition on everything that is given up to the mean.Then this third estimator could be said to have mean wmi + (1 -w) <, i-1 ej variance w2 In + (1 -w)2.The point that this example makes is that the conceptual experiment of repeated sam that underlies classical inference can be ambiguous.Then the ranking of alternative est can also be ambiguous.My own instinct here would be to embed changes in mood in sampling distribution, but not changes in expertise.If the state of mind on which a data an rests does not depend on mood or expertise, then a fully nonrandom response function selected before the latest data are observed, and sampling properties of this response f can be straightforwardly established.These sampling properties remain relevant after the d are observed because the data do not affect the sampling properties that the observer consi But a computer does not distinguish real from hypothetical situations.Humans do would you do if an attractive stranger proposed a rendezvous?Your answer to the hypothet question may be very different from your response to a real proposal.Or I might ask you observed a particular scatter, would you think the model to be linear or quadratic answer to this hypothetical can be very different from your response to real data for a var of reasons, one of which is that you treat the real situation with greater care and thou use my language, the mood that you approach a hypothetical data analysis may be v different from the mood that you approach a real data analysis.Anyway, is it really sensible to try to find the sampling properties of an estimator partly selected by a human?Who is going to sit still for this?
My conclusion: sampling theory is not very useful for selecting responses to real da except in those cases in which the state of mind is perfectly predictable and a fully commit set of plans can be formulated before the data are observed.These cases may be more preval than you might imagine.Many diagnostic statistics precipitate a predictable respons cannot be said to affect the state of mind of the observer in the sense that I have defined.These diagnostics form part of a complex multi-step method of estimation which ought to be scrutinized in the traditional way: either sampling properties should be determined, or the implicit Bayesian prior distribution should be unearthed.

DIAGNOSTIC STATISTICS
The theory outlined in Section 3 allows diagnostic statistics to play thr 1.A 'pre-test diagnostic' may be used to select between a pair of alter 2.An 'elicitation diagnostic' indicates if the inferences are sensitive distribution, and may call for a more accurate measurement of the 3.A 'criticism' may suggest a change in the original model/state of m Each of these is now discussed.

Diagnostics as Part of a Multi-step Planned Response
Suppose that the response to a 'bad' Durbin-Watson statistic is only to correct for first serial correlation.Technically, this is the same as stepwise regression in which a var added to the model if it is sufficiently correlated with the estimated residuals.A t-statistic this potential variable could serve as the 'diagnostic', indicating the need to add this variable.This type of diagnostic is just part of a complex method of estimation.It really sho not be called a diagnostic at all.The complex method of estimation should be subjected traditional scrutiny: either sampling properties should be established, or the prior distr that underlies the estimate should be disclosed.These pretest diagnostics are, I believe, the most prevalent in practice.Usually pecu in a diagnostic precipitates a predictable response.A test for non-normality selects a correc for non-normality; a test for serial correlation can lead to a correction for serial corre a test for heteroscedasticity selects a heteroscedasticity correction.The particular form correction may vary with the mood of the observer, but in principle this variability determined by an outside observer.In that sense the response is predictable, though po random.If so, this is a complex planned response, but the problems are entirely 'above the and do not raise the difficult double-counting issues associated with criticism and rev Diagnostics that Suggest More Accurate Measurement of the Prior A Bayesian approach requires the elicitation of a prior distribution which can be done efficiently after the data are observed, since there are many prior distributions th practically equivalent to the diffuse prior, and there are many others that are prac equivalent to the dogmatic prior.A companion paper (Leamer, 1989) presents some elici diagnostics for the normal linear regression model.These diagnostics indicate when it is a g approximation to use either a diffuse prior distribution or to use the sharp prior that calls a subset of variables to be altogether omitted.If the sample size is small, one might as well the variables; for large samples one might as well include them and estimate with max likelihood.For intermediate sample sizes the prior distribution matters, and needs to b accurately elicited.These elicitation diagnostics do depend on the chi-squared statistic that tests the trad hypothesis that the coefficients of the doubtful variables are collectively zero, and also t-statistic that tests if the omission of the doubtful variables causes bias in the estimates of the issue of interest, but other aspects of the data are also relevant.
These elicitation diagnostics do raise a double-counting problem because they reveal features of the data which may affect the prior distribution that is elicited.My guess is that the measured prior distribution would not be greatly affected by knowledge of these diagnostics, but this is an hypothesis that could be experimentally tested.

Diagnostics as Criticisms
Diagnostics may also serve as criticisms of either the model or the prior distribution.The form that criticism should take is not clear-cut whether one takes a Bayesian or a classical perspective.What feature of the data might suggest that the search for a new model would be successful?If one knew the answer to this question in advance, then the response could be planned and criticism would be unnecessary.I am inclined to think that wrong signs, maybe low R2 values, and data displays might stimulate me to think of a better model.But I share with Hill (1988)  This posterior odds ratio depends on both the prior odds ratio P(Ha)IP(Ho) and a factor defined as the ratio of the density under the alternative to the hdensity unde The null density is large or small only in comparison with the density value for alter adequate prior probability.A data set may come from the tail of the null distrib may come even more remotely from the distributions corresponding to sensibl hypotheses with reasonably large prior probability!The problem of the alternative is not satisfactorily resolved by attempting to distribution of the data given the vague alternative that 'something else' is happe (in Savage, 1962, pp. 75-86;quoted in Hill, 1988) argues: "Professor Savage says 'add at the bottom of the list H1, H2,..., "something else"'.But what is the prob a penny comes up heads given the hypothesis 'something else'?We do n Furthermore, I ask rhetorically, what is the prior probability of 'something else'?I a to think that a sensitivity analysis could be helpful here.More on this below.

Bayesian Encompassing Diagnostics
One of the popular statistics in the LSE-Hendry tradition tests for 'encom embedding a pair of non-nested models into a general composite model and testing to if either outperforms the composite model.When a model does not perform w with the composite model it is said not to 'encompass' the other model.I will comments about these encompassing tests from the Bayesian perspective for th of the linear model.First, the failure or ability of model 1 to 'encompass' model 2 in of Hendry and Mizon is irrelevant for the choice between models 1 and 2.
encompassing statistics can be used as criticisms of the pair of models, though substantial problem in choosing an appropriate significance level.
Consider the simple setting in which there are three competing regression hy depends on Xi, y depends on X2, and y depends on X3: Hi: y-N(Xiki,ai2I) i=s1,2,3 where y is an n x 1 observable vector, Xi is an n x ki observable matrix, f unobservable vector, and ai is an unobservable scalar.
The Bayesian problem of discriminating among these three hypotheses straightforward application of Bayes rule, beginning with the prior probabilities hypotheses and a prior distribution over the parameter space.It is then a stra application of Bayes rule to compute the posterior probability of each of the hy P(Hi I y) = fi(y I X)P(Hi)fYZ fj(y I X)P (Hj) where fi is the marginal likelihood of hypothesis i where gi(fi, ai) is the prior distribution for the parameters under hypothesis i.
The posterior odds ratio of hypothesis 1 relative to hypothesis 2 is then the ratio of weig likelihoods times the prior odds ratio: P(H, I y) f, (y I X) P(H1) P(H2 Iy) f2(Y X) P(H2) This odds ratio that compares model 1 with model 2 has a very important feature: it depend at all on the existence or quality of the third hypothesis.The performance of the hypothesis can add or subtract to the total posterior probability of hypotheses I an cannot affect the division of the posterior probability between them.Thus the ab 'encompass' is irrelevant to the choice between a pair of models.Next, suppose that there are only two fully specified competing hypotheses.Th specific alternatives to these two hypotheses may be identified, it is unlikely that we cou enough confidence in a pair (or finite set) of hypotheses that we would not want to least a little probability for 'something else'.In order to select between H1, H2, and 'some else' we need to specify the distribution of the data if they are generated by 'someth One way to think about this, suggested in Leamer (1974), is to suppose that there is hypothesis H3: y-N(X333,o3I) for which the relevant explanatory variables X3 are not observed.These unobserv must be marginalized from the likelihood function.If, for example, all the expla variables come from a multivariate normal distribution, then this marginalization pro alternative hypothesis4 Ha: y -N(X10l + X202, a21) This composite model that includes both X1 and X2 may be theoretically meani model is formed only as a surrogate for the unspecified alternative that y depen We now have two well-defined hypotheses and a vague alternative.The perform vague alternative can cast doubt on the pair of well-defined hypotheses in the sense of the posterior probability assigned to them.A Bayesian diagnostic is therefore a measur performance of the composite hypothesis with all the explanatory variables compared maintained hypotheses.But, of course, in order to form this measure, one requi distribution for 01 and 02.How one might do this is something of a mystery, which i point in the quotation above.When I feel fatherly, I am inclined to insist that commitment to the choice of prior for these parameters, even though one cannot they represent.This commitment allows one later to correct for successful cri Section 6.
One feature of the distribution for 01 and 02 that might be selected by convention is the m The implicit prior for the unstated model with unobserved variables X3 is that it does explain the data.This must mean that f3 is implicitly revealed to be small, and consequen so are 01 and 62. Measuring the performance of the alternative model re mean but also a prior variance matrix.For reasons discussed below, this is required to adjust the inferences for successful and unsuccessful cr however, it is awfully difficult to submit to this kind of discipline and to c choice of this prior covariance matrix.A possible compromise is a sens allows the prior variance V to be free.A Bayesian diagnostic with known in favour of the alternative hypothesis relative to hypothesis i: B(Ha: HiI V)=S Ifa(YI X, )f( I V) dO B(HasHNi ¥ X)=) i=1,2 A Bayesian diagnostic when V is difficult to select is the maximum Bayes f the vague alternative: Max B(Ha: Hi I V) v I do not pretend to be able to tell you what should be the critical value of these statistics, since the posterior odds ratio depends both on the Bayes factor and also the prior odds ratio.The question that must be answered is: When is this Bayes factor so high that the search for a new model is likely to be successful?Is it 10: 1 in favour of the alternative.Or 100: 1?I don't know.
For that matter, I don't even know if this Bayes factor is useful information.It would take a lot of experience before that could be established.This is clearly not a matter of theory, since to get to this point we have made a number of assumptions that are questionable at best.
Furthermore, this form of criticism applies only if there are competing non-nested hypotheses with non-zero prior probability, a setting which in my opinion is rare in economics.

Classical Criticisms
Classical criticisms are usually 'goodness-of-fit' tests which also indicate whether the data come from the tail of the assumed distribution.These 'goodness-of-fit' tests have a shaky logi foundation.One problem, pointed out by Berkson (1938), is that unless the model is perfec correct, the model will surely be rejected as sample size grows.Diagnostics that are tests of the model against unspecified alternatives thus amount only to elaborate schemes for measur sample size.
It thus seems unlikely that the finding that the data come from the tail of the distribution is properly regarded to be a criticism of the assumed model.But here is a counter-examp suppose that you point out to your class of ten students that two of them have the sam birthday.Many of these students would be surprised and might start wondering if there some nonrandom sorting that has occurred.Then you point to Feller (1957, p. 32) that probability of no matches is only 0-883, so that an event with rather high probability occurred.This will probably dissuade the students from looking for another explanation.N that this sequence of events refers repeatedly to the probability of the data under the assumed model of randomness and never to any alternative.First the probability of a match was though to be very small, and the data seemed sufficiently anomalous to justify the search for alternative.Then the miscalculation was pointed out and the higher probability did not s to justify any further search.Perhaps in this setting one has an intuitive sense of the probabil of this kind of data under the alternative that might be constructed, and also the prio probability of this alternative.It is possible, but it does seem doubtful.Both successful and unsuccessful criticisms have implications for the i properly drawn from a data set.The attempt to criticize, even when it revision, reveals that there is a lack of complete commitment to the assumpt the original plans.This lack of complete commitment requires some alte for example, enlargement of the standard errors of the coefficients to reflec surely are omitted variables that cause bias in the estimates.When the cr there is a double-counting problem because the data are used once to alte and then again to make inferences as if these were the assumptions that beginning.Something needs to be done to limit the double-counting and to m of overfitting.
The corrections for both successful and unsuccessful criticism that I p (1974) treat the phenomenon of hypothesis discovery as if it were a trad sequential observation with an initial decision not to observe some of the that the full model has two explanatory variables: Yi = a + fxi + 'zi + ui where y, x and z are observables and ui is a normally distributed serially term with mean zero and variance a2.Suppose further that z given x is regression: where ei is a normally distributed serially uncorrelated error term with mean zero and varia a,2.Then, if interest focuses on ,f, it is possible to make the decision to observe only y I assume that the prior distribution for f8* is located at the origin, meaning that the expected bias of the least-squares estimate of 0 is zero.The presence of the 'experimental bias' 8 * reduces the effective sample information about f from x'x/a2(x'x/a2)(vx'x/a2 + 1) where v is the prior variance of fi*.Thus the possibility of misspecification requires a discount of the data evidence that depends on the quality of the experiment measured by v.
When a criticism is successful, and it is decided to observe z, the sample information is properly summarized by the regression of y on x and z; no adjustment is necessary for the fact that the data were observed in stages.However, there is a restriction that must be made on the processing of the data.The prior distribution that is used for y must be consistent with the prior that was used for 8* since /8*= r-y.With r and y independent this implies the moments: interpreted to mean that the original decision to omit z reveals a prior fo zero with a variance that is limited depending on the size of the prior varian of this prior is to shrink the estimate of 7y to zero and thus to discount the by the regression of y on x and z.The amount of discounting that is req function of the prior variance v.
To summarize, there are a sequence of discount rates that apply at diffe analysis if there is criticism and potential revision.An initial discount ap is unsuccessful.If this discount is great, meaning that there is a substantial c criticism because the model is probably poorly specified, then the d subsequent models will be less.If, on the other hand, the initial discount initial model is thought to be pretty good, then the results from data-instiga heavily discounted.5

S19
This content downloaded from 108.185.108.226 on Fri, 17 Feb 2023 23:53:51 UTC All use subject to https://about.jstor.org/termsrelevant.When the state of mind is variable, so too are the relevant sampling p sampling properties should dictate the choice of procedures?.In addition, there is a substantial problem in determining what the respon is when it includes a random component chosen by a human.If the response we by an electronic computer with a random component it would be possible t the same data set to see how the computer would respond.For example, a st program could have a random critical value for the t-statistic that selects the v in the equation.The sampling properties of this response function can be es Carlo methods.

S20
This content downloaded from 108.185.108.226 on Fri, 17 Feb 2023 23:53:51 UTC All use subject to https://about.jstor.org/terms the opinion that 'No theory that I know of attempts to answer [this question], which is a formal way to facilitate scientific creativity.'Thus, when you see claims of automated methods of criticism and hypothesis discovery: caveat emptor!Bayesian Criticisms There clearly cannot be a fully acceptable formal Bayesian solution to the choice of criticisms because the Bayesian statistical theory is limited to comparing alternative explicit models.A Bayesian, by selecting a prior distribution, say f(O), and a sampling distribution, say f(y I 0), S21 This content downloaded from 108.185.108.226 on Fri, 17 Feb 2023 23:53:51 UTC All use subject to https://about.jstor.org/termsclaims to know the distribution from which a statistic t(y) is drawn: f(t) = I When the data come from the 'extreme tail' of this distribution, it seems assumptions (f(y I 0) and f(O)) are correct.But which statistic t should be u f(t) small?I am reminded of the old joke: When asked 'How's your wi 'Compared to what?' The point is that a Bayesian can only say one mod another.Formally, the odds in favour of an alternative hypothesis, say Ha, com initial hypothesis, say Ho, are P(Ha y) f(y Ha)P(Ha) P(Ho I y) f(y | Ho)P(Ho)

E
(r)E(y) = 0 E(r2)E('2) = v.These equations place restrictions on the prior distributions for r and 7y.Usually they will be S25 This content downloaded from 108.185.108.226 on Fri, 17 Feb 2023 23:53:51 UTC All use subject to https://about.jstor.org/terms The difference between this and the classical model with a stochastic existence of a complete set of explicit plans in the former case, and the co same in the latter case.But the plan does exist implicitly, even though it is programmed.And it can in principle be uncovered by an experiment in whi confronted with a sequence of observations Y. Once uncovered, it can be traditional kinds of scrutiny.
w( )=O 3These words, 'exploratory' and 'confirmatory', are traditional.The word 'exploratory' evokes the explorer entering uncharted territory with little or no preconceived idea about what is to be found th 'confirmatory' evokes no similarly strong image.What is one called who uses a map for navigation?A t about 'navigatory' data analysis?S15 This content downloaded from 108.185.108.226 on Fri, 17 Feb 2023 23:53:51 UTC All use subject to https://about.jstor.org/termswhere m( ) is a measurement function that depends on the random compon expertise.