I was just in bed failing to fall asleep when I got an e-mail alerting me to a public statement by the University of Queensland acting Pro-Vice-Chancellor (Research and International) Professor Alastair McEwan. The statement is regarding the publicity my recent posts about the “97% consensus” paper by John Cook and colleagues. It’s also highly misleading, if not completely wrong.
The idea behind the statement is the University of Queensland is not trying to hide data. According to McEwan, all data “that are of any scientific value were published.” He says only “information that might be used to identify the individual research participants was withheld.” It’s difficult to reconcile these claims with reality.
I’ll begin with an indisputable issue. We’ve established data for 521 papers was not included in any data files. There’s no way data for these 521 papers would be problematic if the exact same data could be made available for the other 11,944 papers. And surely, if the data for the other 11,944 papers is of “scientific value,” the data for these 521 papers is as well.
Now we’ll move onto a slightly more complicated issue, timestamps. There were over 20,000 ratings done for this project. How could knowing when those ratings were done possibly “be used to identify the individual research participants”? Does anyone really believe it’d be possible to look at a time and date then conclude one particular person, out of the entire world, must have done the rating at that point? Of course not.
Or are we to believe the time of ratings is of “no scientific value”? That can’t be true either. I’ve recently shown there are patterns in some of the ratings determined (partially) by when the ratings were done. Professor Richard Tol has shown the same thing over the entire data set. If data is dependent in part on when it was collected, it is certainly of “scientific value” to take note of the time that data was collected.
This brings us to a third type of data which hasn’t been released – rater IDs. The only reason rater IDs could identify anyone is the ID numbers of participants are the same as their ID numbers at Skeptical Science. That problem could be resolved by simply anonymizing the ID numbers (by assigning new, random values to them). That would make it impossible to identify the participants.
A secondary issue is rater ID data could allow people to associate specific ratings with specific raters. This is inconsequential. If we already know Mr. X was a participant in this study, what harm is caused by knowing which ratings he contributed? The only way someone could be harmed by being associated with specific ratings is if there were issues with those ratings. That would be of “scientific value.” If individual participants were biased or flawed in some other way, that’d be good to know. It’d be so good to know, several participants of the study suggested they should look at it:
I supect that we have all felt that our rating criteria have drifted with time and it would be useful to review those rating where there is disagreement. Some ratings discrepancies may also be fat-finger entry mistakes and those, of course, need to be fixed. It may be helpful to have some stats., for example, the percentage of certain ratings that we have done compared to the group overall. This may reveal some systematic bias of certain individuals, well, maybe not bias but failure to grasp the criteria correctly.
Following Andy, I would like to see the percent I rated in each category relative to the whole sample, i.e. 50% of mine are neutral vs 55% of everyone’s
If the authors of the paper thought data was of “scientific value” when they were doing the study, how can anyone now claim it isn’t of “scientific value? That’s just silly.
But lets move beyond all that. Suppose it truly is important to keep the identity of raters private. Why then did I just load this image at Skeptical Science:
This one also identifies nearly a dozen individual participants. It’s true we only found out about these images because of a hack, but that hack happened nearly two years ago. Surely the authors of the paper shouldn’t leave confidential information in a publicly accessible location for two years, even if people have already seen it.
Beyond that, we have to ask, why was this data ever available in the first place? None of the identities of the participants were keep secret from one another. Heck, people not involved in the project could post in the same forum this data was posted in! How can anyone claim it was confidential? Did everyone involved in the project, and everyone with access to that forum, all sign a confidentiality agreement? If not, the data was never kept confidential.
I’d love to know if there were such confidentiality agreements. That’s why I specifically asked the University of Queensland for them. I wanted to know what data was confidential so I could keep that in mind when considering what data I should or should not release.
John Cook refused to tell me. Later, when the University of Queensland sent me a threatening letter, they invited me to respond. I did, asking for information about these confidentiality issues. They ignored me. They were apparently willing to threaten me with a lawsuit to try to get me to shut up, but they weren’t willing to answer a simple question.
I get Alastair McEwan is in a difficult spot. I don’t sympathize with him though. It’s his own fault. This all began when I asked to be told what data should be kept confidential and why. I’ve asked that question multiple times, and everyone has refused to answer it.
McEwan could have resolved this entire situation by answering that one simple question. He chose not to. He chose not to address this simple issue in a direct and reasonable manner. Instead, he chose to make false and misleading statements to the media.