Deep Climate

Replication and due diligence, Wegman style


By Deep Climate

Today I continue my examination of the key analysis section of the Wegman report on the Mann et al “hockey stick” temperature reconstruction, which uncritically rehashed Steve McIntyre and Ross McKitrick’s purported demonstration of the extreme biasing effect of Mann et al’s “short-centered” principal component analysis.

First, I’ll fill in some much needed context as an antidote to McIntyre and McKitrick’s misleading focus on Mann et al’s use of principal components analysis (PCA) in data preprocessing of tree-ring proxy networks. Their problematic analysis was compounded by Wegman et al’s refusal to even consider all subsequent peer reviewed commentary – commentary that clearly demonstrated that correction of Mann et al’s “short-centered” PCA had minimal impact on the overall reconstruction.

Next, I’ll look at Wegman et al’s “reproduction” of McIntyre and McKitrick’s  simulation of Mann et al’s PCA methodology, published in the pair’s 2005 Geophysical Research Letters article, Hockey sticks, principal components, and spurious significance).  It turns out that the sample leading principal components (PC1s) shown in two key Wegman et al figures were in fact rendered directly from McIntyre and McKitrick’s original archive of simulated “hockey stick” PC1s. Even worse, though, is the astonishing fact that this special collection of “hockey sticks”  is not even  a random sample of the 10,000 pseudo-proxy PC1s originally produced in the GRL study. Rather it expressly contains the very  top 100 – one percent – having the most pronounced upward blade. Thus, McIntyre and McKitrick’s original Fig 1-1, mechanically reproduced by Wegman et al, shows a carefully selected “sample” from the top 1% of simulated  “hockey sticks”. And Wegman’s Fig 4-4, which falsely claimed to show “hockey sticks” mined from low-order, low-autocorrelation “red noise”, contains another 12 from that same 1%!

Finally, I’ll return to the central claim of Wegman et al – that McIntyre and McKitrick had shown that Michael Mann’s “short-centred” principal component analysis would mine “hockey sticks”, even from low-order, low-correlation “red noise” proxies . But both the source code and the hard-wired “hockey stick” figures clearly confirm what physicist David Ritson pointed out more than four years ago, namely that McIntyre and McKitrick’s “compelling” result was in fact based on a highly questionable procedure that generated null proxies with very high auto-correlation and persistence. All these facts are clear from even a cursory examination of McIntyre’s source code, demonstrating once and for all the incompetence and lack of due diligence exhibited by the Wegman report authors.

Before diving into the details of Wegman et al’s misbegotten analysis, it’s worth reminding readers of the context of principal component analysis (PCA) in the Mann et al methodology of multi-proxy temperature reconstruction.

First of all, it should be noted that PCA was used in data pre-processing to reduce large sub-networks of tree-ring proxies to a manageable number of representative principal components, which were then combined with other proxies in the final reconstruction. The PCs retained together reflect the climatological information in the proxy sub-network; the actual number required depends on the size of the sub-network (which itself changes depending on the time period or “step” under consideration). But it also depends on the details of  PCA procedure itself.

In their 2005 GRL article, McIntyre and McKitrick (hereafter M&M) purported to have discovered a major flaw in the Mann et al methodology. Instead of standardizing each proxy series on the mean of the whole series before transforming into a set of PCs, Mann et al standardized on the mean during the instrumental calibration period. McIntyre and McKitrick claimed that the “short-centred” method, when applied to so-called “persistent red noise”, nearly always produces a hockey stick shaped first principal component (PC1).  Furthermore, M&M focused on the PC1 produced for the North American (NOAMER) tree ring proxy sub-network in the 1400-1450 “step” of the MBH reconstruction.  According to M&M, this PC1 was “essential” to the overall reconstruction; thus its “correction” demonstrated  that the original “hockeystick” reconstruction was “spurious”.

While M&M clearly identified a mistake (or, at best, a poor methodological choice), critiques of M&M pointed out two major problems with the M&M analysis. First, it turned out that M&M had left out a crucial step in their emulation of the MBH PCA methodology,  namely the restandardization of proxy series prior to transformation (an issue first raised in Peter Huybers’s published comment). Second, and even more importantly, M&M had neglected to assess the overall impact of any bias engendered by Mann et al’s PCA. In particular, they failed to take into account any criterion for the number of PCs to be retained in the dimensionality reduction step, as well as the the impact of combining all of the retained PCs with the other proxies.

These issues (and more) were treated comprehensively by Wahl and Ammann’s Robustness of the Mann, Bradley, Hughes reconstruction of Northern Hemisphere surface temperatures (Climatic Change, 2007), a paper that was first in press and available online in early 2006. They found that variants of PCA applied to NOAMER tree-ring network had minimal impact on the final reconstruction, as long as the common climatological information in the proxy set was retained. In effect, “short-centered” PCA may have promoted “hockey stick” patterns in the proxy data to higher PCs, but these patterns were still present if all the PCs  necessary to account for sufficient explained variance were retained.  This paper, along with several others, was cited by the National Research Council’s comprehensive report on paleclimatology as demonstrating that the “MBH methodology does not appear to unduly influence reconstructions of hemispheric mean temperature” (Surface Temperature Reconstructions for the Last 2,000 Years, p. 113).

But, as we have seen before, Wegman et al deliberately excluded substantive consideration of the peer-reviewed literature in their analysis of M&M, even misrepresenting the work  of Wahl and Ammann in a flimsy excuse to exclude it from substantive consideration.

MM05a was critiqued by Wahl and Ammann (2006) and the Wahl et al. (2006) based on the lack of statistical skill of their paleoclimate temperature reconstruction. Thus these critiques of the MM05a and MM05b work are not to the point.

Instead, Wegman et al appeared to take at face value all of M&M’s assertions, including the conflation of the NOAMER PC1 with the overall reconstruction, as is very clear from the reproduction of M&M Fig. 1 in Wegman et al 4.1.

Figure 4.1: Reproduced version of Figure 1 in McIntyre and McKitrick (2005b). Top panel is PC1 simulated using MBH 98 methodology from stationary trendless red noise. Bottom panel is the MBH98 Northern Hemisphere temperature index reconstruction.

Discussion: The similarity in shapes is obvious. As mentioned earlier, red noise exhibits a correlation structure, which, although it is a stationary process, to will depart from the zero mean for minor sojourns. However, the top panel clearly exhibits the hockey stick behavior induced by the MBH98 methodology.

Now let’s dig into the implications of that figure and the subsequent “hockey stick” festival of Fig 4.4, in light of a review of McIntyre’s R script, archived as part of the available supplementary information of M&M 2005 at the AGU website. Recall that M&M had created 10,000 simulations of the NOAMER PC1 from null proxy sets created from “persistent red noise” (mistakenly assumed to be conventional red noise by Wegman et al, as seen above). This type of Monte Carlo test is often used to benchmark temperature reconstructions, although McIntyre’s method of creating “random” null proxies was unusual and highly questionable, as we shall see.

The first thing I noted in my original discussion was that,  although the PC1 shown was supposedly a “sample”, it was identical in both the M&M and Wegman et al figures. How did that happen? A quick peek at the code gives the answer – this PC1 is read from an archived set of PC1s previously stored, and is #71 of the set. The relevant lines reads:


Sure enough, the M&M ReadMe confirms this:

2004GL021750-hockeysticks.txt.  Collation of 100 out of 10000 simulated PC1s, dimension 581 x 100

The eagle-eyed reader will note the code doesn’t have the prepended archive name, one error among many that Wegman et al would have had to correct with McIntyre’s assistance (the same mistakes that Peter Huybers had fixed, apparently on his own, some months before in his version of the script). Indeed, the script saves a set 100 simulation PC1s every time, and so this archived set is a particular group that was saved at some point.

However, the more interesting question is this: Exactly how was this sample of 100 hockey stick PC1s selected from the 10,000? That too is answered in the script code.

As the simulation progresses, statistics are gathered on each and every simulated PC1. One of those statistics is McIntyre’s so-called “hockey stick index” or HSI, which in effect measures the length of the blade of each PC1 “hockeystick”. An HSI of 1 means that the mean of the calibration period portion of the series is 1 standard deviation above the mean of the whole series.

The HSI for each PC1 (themselves numbered from 1 to 10,000) is stored in an array called stat2, as seen in this code snippet that renders the overall results.

#simulations are not hard-wired and results will differ slightly in each run
#the values shown here are for a run of 10,000 and results for run of 100 need to be multiplied

temp<-(stat2>1)|(stat2< -1); 	  sum(temp) #[1] 9934
temp<-(stat2>1.5)|(stat2< -1.5);  sum(temp) #[1] 7351
temp<-(stat2>1.75)|(stat2< -1.75);sum(temp) #[1] 2053
temp<-(stat2>2)|(stat2< -2); 	  sum(temp) #[1]  25

The comments show that more than 99% of the PCs have a hockey stick index with an absolute value greater than 1, and a few even have the extreme values of 2.

Now, was a random sample of these PC1s saved? Or perhaps just the first 100 (which would also be reasonably random)? Not quite.


for (nn in 1:NN) {
} #nn-iteration


The first line sorts the set of PC1s by descending HSI and then copies the first 100 to another array. That array is then sorted by PC1 index so that each PC1  selected for the archive can be retrieved from the appropriate temporary file and saved in ASCII format.

Recall M&M’s description of the “sample” PC1 in figure 1 (Wegman et al 4.1):

The simulations nearly always yielded PC1s with a hockey stick shape, some of which bore a quite remarkable similarity to the actual MBH98 temperature reconstruction – as shown by the example in Figure 1.

That’s “some” PC1, all right. It was carefully selected from the top 100 upward bending PC1s, a mere 1% of all the PC1s.

To confirm these findings, I downloaded the archive “sample” set of PC1s.  First I plotted good old # 71, which is can be readily seen is identical to the”sample” PC1 in McIntyre’s original Fig 1 (and Wegman et al 4.1).

And here are the corresponding identical panels from M&M Fig 1 and Wegman et al Fig 4.1:

I then calculated the “Hockey stick index” for each of the 100 and confirmed that they all had HSI greater than 1.9. Also, 12 of the 100 had an HSI above 2, which jibes with the totals given by McIntyre in the comments and the main article (presumably there were another 13 severely downward PC1s with HSI less than -2).

It turns out that #71 is not too shabby. Its HSI is 1.97, which is #23 on the HSI hit parade, out of 10,000.

I then turned to Fig 4.4, which presented 12 more simulation PC1 hockey sticks. Although this figure was not part of the original M&M article, there is a fourth figure generated in the script, featuring a 2×6 display, just like the Wegman figure. A quick perusal of the code shows that these too were read from McIntyre’s special 1% collection, although a different selection of 12 PC1s would be output each time.

     width = 8, height = 11,
     horizontal = FALSE, onefile = FALSE, paper = "special",
     family = "Helvetica",bg="white",pointsize=8)

nf <- layout(array(1:12,dim=c(6,2)),heights=c(1.1,rep(1,4),1.1))

To confirm this, I set up a dynamic chart in Excel, and scrolled through to find the first PC1 displayed (in the upper left hand corner). Here is the dynamic chart in action, with #35 selected, followed by a close up of the matching top left PC1 in Wegman et al Fig 4.4.

Two more scrolls through and I had identified all 12 (independently corroborated by another correspondent). So here is Fig 4-4, with each “hockey stick” identified by its position within McIntyre’s top 1%, as well as its original identifier (an index between 1 and 10,000).

And as verification, I reran the final part of the M&M script, but with a small change to coerce display of the same 12 PC1 “hockey sticks” as Wegman et al had shown.

### index<-sample(100,12)
index = c(35,14,46,100,91,81,49,4,72,54,33)

Here is Wegman et al Fig 4-4 side by side with the resulting reproduction of the 12 hockey stick figure from the M&M script. They are clearly identical (although the Wegman et al version is not as dark for some reason).

Wegman et al 4.4 (L) and  12 identified  PC1s from the top 1% archive (R).

Naturally, the true provenance of Fig 4-4 sheds a harsh light on Wegman et al’s wildly inaccurate caption:

Figure 4.4: One of the most compelling illustrations that McIntyre and McKitrick have produced is created by feeding red noise [AR(1) with parameter = 0.2] into the MBH algorithm. The AR(1) process is a stationary process meaning that it should not exhibit any long-term trend. The MBH98 algorithm found ‘hockey stick’ trend in each of the independent replications.

As was pointed out long ago by David Ritson (and discussed here recently), this greatly overstates what McIntyre and McKitrick actually demonstrated. To fully understand this point, it is necessary to quickly review the statistical models  underpinning paleoclimatological reconstructions. Typically, proxies are considered to contain a climate signal of interest (in this case temperature), combined with noise representing the effects of non-climatic influences. This non-climatic noise is usually modeled with a low-order autoregressive (AR) model, meaning that the noise in a given year is correlated to that in immediately preceding years. Specifically, an AR model of order 1, commonly called “red noise”, specifies that values at time t in the time series be correlated with the immediately preceding values at time t-1. The amount of this auto-correlation is given by the lag-one coefficient parameter.

One of the common uses of AR noise models is to benchmark paleoclimatological reconstructions. In this procedure, random “null” proxy sets are generated and their performance in “reconstructing” temperature for part of the instrumental period is used to establish a threshold for the statistical skill of reconstruction from the real proxies. As estimated through various methods, the lag-one correlation coefficient used to generate the random null proxy sets is usually set between 0.2 and 0.4. (Mann et al 2008 used 0.24 and 0.4).

As we have seen above, M&M employed a similar concept to generate simulated PC1s from random noise proxies. (Confusingly, M&M called their null proxies “pseudo-proxies”, a term previously  employed for test proxies generated by adding noise to simulated past temperatures). However, M&M’s null proxies were not generated as AR1 noise as claimed by Wegman et al, but rather by using the full autocorrelation structure of the real proxies (in this case, the 70 series of the 1400 NOAMER tree-ring proxy sub-network).

The description of the method was somewhat obscure (and completely misunderstood by Wegman et al). But we can piece it together by following the code.

if (method2=="arfima")
{Data<-array(rep(NA,N*ncol(tree)), dim=c(N,ncol(tree)));
	for (k in 1:ncol(tree)){
} #arfima
if (method2=="arfima")
	b<-array (rep(NA,N*n), dim=c(N,n) )
	for (k in 1:n) {

The ARFIMA notation refers to a more complicated three-part statistical model (the three parts being AutoRegressive, Fractional Integrative and Moving Average). It is a generalization of the  ARIMA (autoregressive integrative moving average) model, itself an extension of the familiar  ARMA (autoregressive moving average) model. ARFIMA permits the modeling, with just a few parameters, of “long memory” time series exhibiting high “persistence”. The generalized ARFIMA model was presented by J. R. M. Hosking in his 1981 paper, Fractional Differencing (Biometrika, 1981). A subsequent paper, Modeling persistence in hydrological time series using fractional differencing (Water Resources, 1984), outlined a method to derive a particular ARFIMA model from the  full autocorrelation function of a time series, and generate a corresponding random synthetic series based on the ARFIMA parameters derived from that autocorrelation structure.

So first, each of the 70 tree-ring proxies used by Mann et al was analyzed for its complete individual auto-correlation structure, using the acf() function. Then within each of the 10,000 simulation runs, the Hosking algorithm (as implemented in the hosking.sim R function) was used to generate a set of 70 random “null” proxies, each one having the same auto-correlation structure, represented by its particular ARFIMA parameters, as its corresponding real proxy.

Naturally, this raises a number of issues that were not addressed by Wegman et al, since the authors completely failed to understand the actual methodology used in the first place. (This abject failure can be explained in part, if not excused, by M&M’s misleading description of their random proxies as consisting of “trendless” or “persistent” red noise, a nomenclature found nowhere else). Foremost among these is McIntyre and McKitrick’s implicit assumption that the “noise” component of the proxies, absent any climatic signal, can be assumed to have the same auto-correlation characteristics as the proxy series themselves.  But as von Storch et al (2009) observed:

Note that the fact that such [long-term persistence] processes may have been identified in climatic records does not imply that they may also be able to represent non-climatic noise in proxies.

Even if Wegman et al completely missed the real issues, other critics did not fail to point out problems with the M&M null proxies, as I have discussed before. Ammann and Wahl (Climatic Change, 2007),  observed that by using the “full autoregressive structure” of the real proxies, M&M “train their stochastic engine with significant (if not dominant) low frequency climate signal”. And, as we have already seen, David Ritson was aware of the true M&M methodology back in November 2005, and pointed out M&M’s “improper” methodology to Wegman et al within weeks of the Wegman report.

Indeed, the real reasons Wegman et al never released “their” code nor associated information are now perfectly clear. Doing so would have amounted to an admission that the supposed “reproduction” of the M&M results was nothing more than a mechanical rerun of the original script, accompanied by a colossally mistaken interpretation of M&M’s methodology and findings.

In any event, under attack by both Ritson and Ammann and Wahl, Steve McIntyre claimed several times that an extreme biasing effect had been demonstrated by using AR1 noise instead of the dubious ARFIMA null proxies, by both the NRC report and Wegman et al. As late as September 2008, McIntyre proclaimed:

The hosking.sim algorithm uses the entire ACF and people have worried that this method may have incorporated a “signal” into the simulations. It’s not something that bothered Wegman or NAS, since the effect also holds for AR1, but Wahl and Ammann throw it up as a spitball.

However, this claim is also highly misleading. First, as is now clear, McIntyre’s reliance on Wegman et al is speciously circular; Wegman et al mistakenly claimed that it was McIntyre and McKitrick who had demonstrated the “hockey stick” effect with AR1 noise, and certainly provided no evidence of their own.

So that leaves the NRC. It’s true that NRC did provide a demonstration of the bias effect using AR1 noise instead of ARFIMA. But it was necessary to choose a very high lag-one coefficient parameter (0.9) to show the extreme theoretical bias of “short-centered” PCA.  Indeed, that high parameter was chosen by the NRC expressly  because it represented noise “similar” to McIntyre’s more complex methodology.

To understand what a huge difference the choice of AR1 parameter can make, recall David Ritson’s formula for estimation of persistence (or “decorrelation time”) in AR1 noise, which in turn is based on the exponential decay in correlation of successive terms in the time series. The “decorrelation time” is given by (1 + phi)/(1 – phi).

Using this formula, one can calculate a persistence of 19 years within AR1(.9) noise, as opposed to a mere 1.5 years with the AR1(.2) noise claimed by Wegman et al to produce extreme “hockey stick” PC1s. And, as one might expect, rerunning the NRC code with the lower AR1 0.2  parameter yields dramatically different results.

The first chart (at left) is a rerun of the NRC code (available here); the second chart (at right) is the same code but with the AR1 parameter set to 0.2, instead of 0.9, by simply adjusting one line of code:

phi <- 0.2;

5 PC1s generated from AR1(.9) (left) and AR1 (.2) (right) red noise null proxies.

So Wegman et al’s “compelling” demonstration is shown to be completely false; the biasing effect of “short-centered” PCA is much less evident when applied to AR1(.2),  even when viewing the simulated PC1s in isolation. To show the extreme effect claimed by McIntyre, one must use an unrealistically high AR1 parameter. This is yet one more reason that the NRC’s  ultimate finding on the matter, namely that “short-centered” PCA did not “unduly influence” the resulting Mann et al reconstruction, is entirely unsurprising.

In summary, then, I have shown in Wegman et al a misleading focus on one particular step of the Mann et al algorithm, accompanied by a refusal to substantively consider the the peer-reviewed scientific critiques of M&M.

And to top it all, Wegman et al flatly  stated that the biasing Mann et al algorithm would produce “hockey stick” reconstructions from low-order, low-autocorrelation red noise, while displaying a set of curves from the top 1% of “hockey sticks” produced from high-persistence random proxies. Those facts are clearly apparent from McIntyre and McKitrick’s source code, as modified and run by Wegman et al themselves.

Make no mistake – this strikes at the heart of Wegman et al’s findings, which held that the writing of Mann et al was “somewhat obscure”, while McIntyre and McKitrick’s “criticisms” were found to be “valid” and their arguments  “compelling”. And yet the only “compelling illustration” offered by Wegman et al was the supposed consistent production of “hockey sticks” from low-correlation red noise by the Mann et al algorithm. But the M&M simulation turned out to be based on nothing of the kind, and, to top it all, showed only the top 1% of simulated “hockey stick” PC1s.

So there you have it: Wegman et al’s  endorsement of McIntyre and McKitrick’s “compelling” critique rests on abysmal scholarship, characterized by deliberate exclusion of relevant scientific  literature, incompetent analysis and a complete lack of due diligence. It’s high time to admit the obvious: the Wegman report should be retracted.

[Update, November 22: According to USA Today’s Dan Vergano, a number of plagiarism experts have confirmed that the Wegman Report was “partly based on material copied from textbooks, Wikipedia and the writings of one of the scientists criticized in the report”. The allegations, first raised here in a series of posts starting in December 2009 (notably here and here), are now being formally addressed by statistician Edward Wegman’s employer, George Mason University. ]