Cook et al’s Hidden Data

You might think this post about the previously undisclosed material I recently gained possession of. It’s not. Even with the additional material I now have, there is still data not available to anyone.

You see, while people have talked about rater ID information and timestamps not being available, everybody seems to ignore the fact Cook et al chose not to release data for 521 of the papers they examined.

I bring this up because Dana Nuccitelli, second author of the paper, recently said:

Morph – all the data are available, except confidential bits like author self-ratings. We even created a website where people can attempt to replicate our results. We could not be more transparent.

Actually, they could be much more transparent. Here is the data file they released showing all papers and their ratings. It has 11,944 entries. Here is a concordance showing the ID numbers of the papers they rated. It has 11,944 entries. Here is a data file showing all the ratings done by the group. It has entries for 11,944 papers.

The problem is there were 12,465 papers:

The ISI search generated 12 465 papers. Eliminating papers that were not peer-reviewed (186), not climate-related (288) or without an abstract (47) reduced the analysis to 11 944 papers

Cook et al eliminated 521 papers from their analysis. That’s fine. Filtering out inappropriate data is normal. What’s not normal is hiding that data. People should be allowed to look at what was excluded and why. Authors should not be able to remove ~4% of their data in a way which is completely unverifiable.

But it’s worse than that. The authors didn’t do what their description suggests they did. Their description suggests only 47 papers their search generated had missing abstracts. That’s not true. Over two hundred of the results did not have abstracts. We know this because John Cook said so in his own forum. In a topic titled Tracking down missing abstracts, he said:

Well, we’ve got the ‘no abstracts’ down to 70 which is not too shabby out of 12,272 papers (and starting with over 200 papers without abstracts). I’m guessing a number of those will be opinion/news/commentary rather than peer-reviewed papers.

The 12,272 doesn’t match the 12,465 number because more papers were added later. That doesn’t matter though. What matters is at least 200 of their search results did not have abstracts. The group went out and looked for missing abstracts, inserting ones they found. No documentation of these insertions has ever been released. The fact their search results were modified has never even been disclosed.

It’s impossible to know which abstracts were added. That means it is impossible to verify the correct abstracts were added without verifying all ~12,000 results. That also means it is impossible to ensure the abstracts added were not a biased sample.

There ‘s more. We’re told 186 papers were excluded for not being peer-reviewed. No explanation is given as to how they determined which papers were and were not peer-reviewed. Comments in the forums show there was no formal methodology. People just investigated results which seemed suspicious to them. There is no way to know how good a job they did of removing non-peer-reviewed material.

And there’s still more. We’re told 288 papers were excluded for not being climate-related. Again, no explanation is given as to how this filter was applied. It does not seem to have been applied well. For example, while 288 papers were excluded for this reason, one of the most active raters said this in the forum:

I have started wondering if there’s some journals missing from our sample or something like that, because I have now rated 1300 papers and I think I have only encountered a few papers that are actually relevant to the issue of AGW. There are lots ond lots of impacts and mitigation papers but I haven’t seen much of papers actually studying global warming itself. This might be something to consider and check after rating phase.

If only “a few” out of 1300 papers were “actually relevant to the issue of AGW,” how is it 12,177 papers out of 12,465 were “climate related”? The only explanation I can find is most papers are “climate related” but not “actually relevant to the issue of AGW.” This is an example. It’s one of the 64 papers placed in the highest category (explicitly claiming humans cause 50%+ of recent warming), and it says:

This work shows that carbon dioxide, which is a main contributor to the global warming effect, could be utilized as a selective oxidant in the oxidative dehydrogenation of ethylbenzene over alumina-supported vanadium oxide catalysts. The modification of the catalytically active vanadium oxide component with appropriate amounts of antimony oxide led to more stable catalytic performance along with a higher styrene yield (76%) at high styrene selectivity (>95%). The improved catalytic behavior was attributable to the enhanced redox properties of the active V-sites.

If you don’t know what any of that means, don’t feel bad. The paper is about a narrow chemistry subject which has no bearing on global warming. It’s only relation to climate is that one clause, “which is a main contributor to the global warming effect.” According to Cook et al, that is apparently enough to make it “climate related.” In fact, that’s enough to make this paper one of the 64 which most strongly support global warming concerns.

Given that, it’s difficult to imagine what papers might have been rated “not climate related.” Fortunately, we don’t have to use our imaginations. While it’s true Cook et al excluded all this data from their data files, it turns out that data is available via the search function they built.

Nobody could have guessed that. Nobody who downloaded data files would have thought to go to a website and use a function to find information excluded from those data files. Even if they had, the site only shows the final ratings of those papers. It doesn’t show any intermediary data like that in the data files.

Regardless, it does allow us to check some of the concerns raised in this post. For example, we can do a search to see what sort of papers were considered “not climate related.” I’ll provide the only title in category 1 and part of its abstract:

Now What Do People Know About Global Climate Change? Survey Studies Of Educated Laypeople

When asked how to address the problem of climate change, while respondents in 1992 were unable to differentiate between general “good environmental practices” and actions specific to addressing climate change, respondents in 2009 have begun to appreciate the differences. Despite this, many individuals in 2009 still had incorrect beliefs about climate change, and still did not appear to fully appreciate key facts such as that global warming is primarily due to increased concentrations of carbon dioxide in the atmosphere, and the single most important source of this carbon dioxide is the combustion of fossil fuels.

This abstract is more forceful in its endorsement of global warming concerns than many of the ones labeled “climate related.” It’s topic, what people know about global warming, is certainly more relevant than topics like the molecular chemistry in material production. I could post example after example showing the same pattern. Papers excluded for not being “climate related” are often far more relevant than papers Cook et al included.

You could never find this out by examining Cook et al’s data files though. Those data files exclude the information necessary to check things like this. It’s only because of an undisclosed difference in their data sets that we could ever hope to check their work on this.

By the way, I encourage everyone to use that search feature to find examples of what I refer to. It’s amazing how many of the papers making up the “consensus” are not “actually relevant to the issue of AGW.”



  1. Cook’s classification system was a rolling process. Abstracts were added midway in May 2012 after the process had started, as mentioned in the paper. Presumably, this was to cover the time period between initial WoK search & abstract download and the start of abstract rating. This is likely the 200 abstracts that did not get an initial rating but got rated as the exercise rolled on.

  2. Hey, can you confirm that the vanadium catalyst paper wasn’t rejected? Was that actually counted?

    This is so rich! I guarantee that the global warming talk in that paper doesn’t extend beyond the abstract and possibly the introductory sentences of the paper.

  3. ZombieSymmetry, sure. The title of that paper was:

    Utilization of carbon dioxide as soft oxidant in the dehydrogenation of ethylbenzene over supported vanadium-antimony oxide catalystst

    It’s lead author was Jong-San Chang, and it was published in 2003. If we look in the official data file given by the journal’s website, we see:

    2003,Utilization Of Carbon Dioxide As Soft Oxidant In The Dehydrogenation Of Ethylbenzene Over Supported Vanadium-antimony Oxide Catalystst,Green Chemistry,Chang| Js; Vislovskiy| Vp; Park| Ms; Hong| Dy; Yoo| Js; Park| Se,3,1

    We can also see Cook et al gave it an ID number of 6271 by looking in this file. With that information, we can find it in their ratings file, where we see:

    6271 1 3 1 3
    6271 3 3 1 3

    All of these files have information on the 11,944 papers not rejected by Cook et al. None of the files have information on the 521 papers that were rejected. In other words, that paper was definitely counted as one of the 64 in the highest category.

  4. I’ve edited this post to fix a few typos. Two were simple misspellings, but one altered the meaning of a sentence. In the first sentence of the third to last paragraph, I had originally said “not climate related” instead of “climate related.” The way it was written, I was contrasting a paper labeled “not climate related” to papers labeled “not climate related.” That made it sound silly.

  5. Subject: Peer reviews
    Did Cook et al really check if their publications cited were properly(!) peer reviewed or did they just believe what others said on this quite important point?
    It ought to be made public who performed the peer reviews on every single publication as well. Some of these papers seem so specific, that there are -supposedly- not too many ‘peers’ around that are able to do a qualified peer review. My suspicion is that there is quite a good number of mutual peer reviewing. It seems a matter of honour amongst thieves. If there is something wrong with the Gaussian distribution curve, then…

  6. Non Nomen, they had no formal method for decided what was and was not peer-reviewed. When they thought a paper hadn’t been peer-reviewed, they investigated. That’s one of the many aspects of the study which introduce the potential of bias.

    That said, the Web of Science resource they used is supposed to only return peer-reviewed material. It’s understandable to trust it on that.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s