The last post in this series showed Michael Mann’s hockey stick depends on a tiny amount of data. Previous posts showed Michael Mann knew this. Today’s post is going to show how he managed to pick the data that gave the results he wanted. That’s right, I’m finally going to discuss principal component analysis.
But before we look at how Mann’s faulty implementation of PCA allowed him to cherry-pick his results, let’s remember he had two different proxies with a hockey stick shape. One (Gaspe) wasn’t created via PCA. As I said of it before:
That series was an arbitrarily extended (no other series was extended like it) version of a series already included in his data set. The extension of that series was originally undisclosed, acknowledged only after Mann was forced by his critics to publish a corrigendum saying:
For one of the 12 ‘Northern Treeline’ records of Jacoby et al. used in ref. 1 (the ‘St Anne River’ series), the values used for AD 1400–03 were equal to the value for the first available year (AD 1404).
Even after it was acknowledged, no explanation was provided for the extension, and there has never been an acknowledgment of the fact the series was used twice.
That is simple cherry-picking. He took a data series already in his data set, duplicated it, arbitrarily extended it back in time, then set it aside so PCA wouldn’t be applied to it. Unfortunately, PCA is not so simple.
Principal component analysis is a way of trying to extract signals from data. You take many series, compare them and see what is similar. The problem is Michael Mann did not look at entire series. What he did is related to something known as the “screening fallacy.” lucia at The Blackboard has a great post demonstrating it.
Basically, we know temperatures in the 1900s went up so it makes sense to look for proxy data which shows rising temperatures in the 1900s. We take whatever data that does, average it together, and we find modern temperatures are higher than they’ve ever been.
That makes sense until we realize the process guarantees the results. If you only use data which rises in the 1900s, your data is guaranteed to show rising temperatures in the 1900s. Imagine you looked at a bunch of data and found these three series within it:
All three agree temperatures have gone up in the last period. They don’t agree about anything else though. So what would happen if you averaged them together?
You got yourself a hockey stick. You could throw a bunch of noisy series in your data set, and you’d still get a hockey stick. The reason is everything before 1900 is basically random, but everything after 1900 is guaranteed to go up.
Now then, that’s not quite what Michael Mann did. PCA doesn’t look for specific patterns. Instead, the more the series varies from the norm, the more weight PCA gives that series. The trick is in how you define “the norm.” PCA says the norm is the average over the entire series. Michael Mann decided otherwise. What he did was define the “norm” as the average over the 1900s.
Of course, if you set your norm to a small period in which series are rising, you’ll find series tends to deviate a lot from that norm. You can see this by comparing the black line (averaged over the entire series) and red line (averaged over the last period) in this figure:
The black line stays fairly close to 0. As such, it’d receive little weight. The red line keeps far from 0. As such, it’d receive more weight. There’s more to it than I’m describing (including two rescaling steps), but it’s all rooted in the same basic idea:
Under Mann’s approach, if a series changes after 1900 instead of before 1900, it gets more weight. And because his approach favors changes post-1900, it favors changes in the form of a hockey stick.
There is no statistical reason to use “de-centered” baselines like Mann did. Ian Joliffe, one of the people Mann used as a reference on statistics, flat-out said, “I don’t know how to interpret the results when such a strange centring is used.” The reality is nobody knows how to interpret those results because the methodology is nonsensical.
And that’s not really a matter of dispute. Mann still claims his methodology is okay, but he doesn’t focus on that. What he focuses on is the fact you can get a hockey stick if you do PCA correctly. He claims:
Curiously undisclosed by MM in their criticism is the fact that precisely the same ‘hockey stick’ pattern that appears using the MBH98 convention (as PC series #1) also appears using the MM convention, albeit slightly lower down in rank (PC series #4) (Figure 1).
That fact was not undisclosed by Steve McIntyre and Ross McKitrick (MM), but otherwise, what Mann says is true. PCA of any style will find a hockey stick. The only difference is PC1 is the 1st most important signal while PC4 is the 4th most important signal. That’s a pretty significant difference.
But really, who cares? Michael Mann only got a hockey stick because of two series. One (Gaspe) was cherry-picked in a straightforward manner. The other (NOAMER PC1) can be cherry-picked via PCA or whatever method you prefer.
It doesn’t matter. He’s still just cherry-picking a tiny amount of data.