Cook et al’s Strange Definitions

I made damning criticisms of the Skeptical Science consensus paper within days of it coming out. I’ve never received much response from them. I suspect it’s because every time they respond to criticisms, they make themselves look even worse.

John Cook and his associates made this important claim in their paper:

Each abstract was categorized by two independent, anonymized raters.

Using independent raters is a way of combating potential biases. As an obvious example, people who support a position may be more inclined to think papers support the same position. The best way to combat that is to get many people with many different worldviews. Cook and associates did it by using “independent” raters.

The problem is their use of “independent” is nothing like anything any sensible person would ever use. All of the raters of the paper were active participants at Skeptical Science. They routinely talked to one another on the Skeptical Science forum, run by the lead author of the paper. They even had a page in which many of them were labeled as being part of “The Skeptical Science Team” for years before the project began.

No sensible person would say they are on a team of ~20 people then claim to be independent of the other team members.

Beyond that, the raters actively discussed ratings and how to interpret rules with one another. The project’s forum had posts with titles like, “how to rate: Cool Dudes: The Denial Of Climate Change…” This lead one of the authors, Sarah Green, to say:

But, this is clearly not an independent poll, nor really a statistical exercise. We are just assisting in the effort to apply defined criteria to the abstracts with the goal of classifying them as objectively as possible.
Disagreements arise because neither the criteria nor the abstracts can be 100% precise. We have already gone down the path of trying to reach a consensus through the discussions of particular cases. From the start we would never be able to claim that ratings were done by independent, unbiased, or random people anyhow.

The authors of the paper accept the fact they discussed specific ratings while doing them, but they claim the above quote is “out of context” and say:

Discussion of the methodology of categorising abstract text formed part of the training period in the initial stages of the rating period. When presented to raters, abstracts were selected at random from a sample size of 12,464. Hence for all practical purposes, each rating session was independent from other rating sessions. While a few example abstracts were discussed for the purposes of rater training and clarification of category parameters, the ratings and raters were otherwise independent.

They provide no explanation as to how context could change the meaning of Sarah Green’s comment. They provide no context that does change the meaning. All they do is acknowledge the fact they discussed specific ratings and defend it by saying those discussions were “for the purposes of rater training.” The evidence shows that is false.

One topic’s title was, “second opinion??” In no way does that imply training is involved. The same is true of the topic creator’s post, which merely cites a paper’s title and summary while asking:

A poor translation, but I think it’s saying we’re lucky to have AGW because it offsets dangerous global cooling; except where it says humans have caused cooling since 1950. So, does it support or reject AGW?

There is no desire for training there. All it is is one rater asking other raters for their opinion about what rating to pick. The same is true for another topic where a rater simply asks:

True in my experience, but how do I rate it??

These discussions are clearly not for training purposes.

As for the claim discussions of rater guidelines during the rating process “formed part of the training period in the initial stages of the rating period,” that is contradicted by the fact their “Official TCP Guidelines” topic had an active discussion of how to interpret the rules up to March 15th.

John Cook made a graph on March 15, showing how many ratings the top raters had done: shows over 15,000 ratings had been performed during what the authors now call the “initial stages of the rating period.” There were fewer than 30,000 total ratings. That means the authors are defending their active discussion of how to interpret the rating guidelines by claiming the “initial stages of the rating period” covered more than half of the ratings they did.

It’s no wonder they didn’t respond more than a year ago when I called them out on their supposed independence. At least, not publicly. We can see they discussed the post in private (taken from the list of links they posted in their forum):



  1. So many, many parallels…

    Humpty Dumpty took the book and looked at it carefully. ‘That seems to be done right —’ he began.
    ‘You’re holding it upside down!’ Alice interrupted.
    ‘To be sure I was!’ Humpty Dumpty said gaily as she turned it round for him. ‘I thought it looked a little queer. As I was saying, that seems to be done right — though I haven’t time to look it over thoroughly just now — and that shows that there are three hundred and sixty-four days when you might get un-birthday presents —’
    ‘Certainly,’ said Alice.
    ‘And only one for birthday presents, you know. There’s glory for you!’
    ‘I don’t know what you mean by “glory”,’ Alice said.
    Humpty Dumpty smiled contemptuously. ‘Of course you don’t — till I tell you. I meant “there’s a nice knock-down argument for you!”‘
    ‘But “glory” doesn’t mean “a nice knock-down argument”,’ Alice objected.
    ‘When I use a word,’ Humpty Dumpty said, in rather a scornful tone, ‘it means just what I choose it to mean — neither more nor less.’
    ‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’
    ‘The question is,’ said Humpty Dumpty, ‘which is to be master — that’s all.’
    Alice was too much puzzled to say anything; so after a minute Humpty Dumpty began again. ‘They’ve a temper, some of them — particularly verbs: they’re the proudest — adjectives you can do anything with, but not verbs — however, I can manage the whole lot of them! Impenetrability! That’s what I say!’
    ‘Would you tell me please,’ said Alice, ‘what that means?’
    ‘Now you talk like a reasonable child,’ said Humpty Dumpty, looking very much pleased. ‘I meant by “impenetrability” that we’ve had enough of that subject, and it would be just as well if you’d mention what you mean to do next, as I suppose you don’t mean to stop here all the rest of your life.’
    ‘That’s a great deal to make one word mean,’ Alice said in a thoughtful tone.
    ‘When I make a word do a lot of work like that,’ said Humpty Dumpty, ‘I always pay it extra.’

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s