It’s Time

As you may know, John Cook and others of Skeptical Science website published a paper finding a 97% consensus on global warming. They then refused to release data related to their work. I eventually came into possession of that data, as well as other material. When I contacted John Cook to discuss what should be done with that data, he refused any sort of dialogue. Instead, he had the University of Queensland make absurd legal threats (including threatening to sue me if I told people they had threatened to sue me) then broke off all communication.

Because of that and other reasons, I decided to release the material I came across. I explained my reasons here. I won’t repeat myself. Instead, I am going to just discuss the material I’m releasing. You can find that material here.

When you click on the link, you’ll be directed to a “mirror” of what I stumbled upon. Not only will you be able to see the data I found, you will be able to see it in the same form I found it (save some technical limitations). You’ll even find the same display settings I saw.

Upon accessing the mirror, you will find approximately 40 links. Some are uninteresting. Some don’t even have any material in them. That was true when I accessed them. The links I find most interesting are these two.

The first link shows the datestamps of ratings performed for this paper. This link is interesting as John Cook and associates explicitly denied having recorded timestamps, saying:

Timestamps for the ratings were not collected, and the information would be irrelevant.

While Cook responded to a criticism from Professor Richard Tol by telling a representative of the University Queensland the journal which published his paper (Environmental Research Letters):

said I didn’t have to include time stamp info but I’m probably going to anyway, just to show Tol’s fatigue theory is all rubbish.

Timestamps and datestamps are different things. It’s impossible to tell which of John Cook’s statements were true. Still, it’s interesting to see datestamps were recorded.

The second link makes these datestamps more interesting. It shows the rater ID# for various ratings. This allows us to tell who performed each rating, and how. Combining the data of the two links allows us to see some individuals rated over 200 abstracts on individual days. Cook et al may feel that is “irrelevant,” but others are certainly free to disagree. Additionally, that second link is what allowed me to create this image:

5-10-pre-reconciliation

Showing how the various raters rated the abstracts. Each column shows a value an abstract could be given. Each color represents an individual rater. The vertical alignment shows how often (proportionally) each individual rater selected a particular value. The size of the circles indicate how many times (total) each individual rater chose a given value.

It’s a bit complicated, but the basic point is the more raters agreed about how things should be rated, the more the circles would align. Where the circles do not align, there was a systematic difference in opinion. Such a systematic difference would necessarily introduce bias in the results. The effect of that bias would be directly proportional to the amount of ratings done by the biased individual – represented by the size of the circle.

These biases are especially interesting as all raters were Skeptical Science representatives. They are all advocates for the global warming movement. If we know there were sysetmatic disagreements amongst people who share the same cause, we can only imagine what disagreements there would be if neutral raters had been used.

Another link I especially recommend examining is this one. It will take you to an index of 21 data files, one for each year from 1991-2011. These files include much of the same data linked to above, but they also include comments left by raters during the rating process. Unfortunately, raters could modify their comments at any time, and only the latest version is available. It still gives us insight into why raters selected the values they selected.

In addition to the material I’m now releasing, you may find it worth examining a copy of the Skeptical Science forum released by a hacker while this project was underway. I’ve made an easily accessible version of it available online here. It even includes a subforum specifically created for this paper (direct link here).

You may also find the ethics approval for the paper interesting. It’s available here. Of particular note, this ethics approval was submitted only after the Skeptical Science group had finished their ratings. It does not cover the bulk of their paper. Instead, it only covers John Cook’s requests to individual scientists to rate their own papers. The Skeptical Science group ratings were done without any oversight or ethical approval.


As a final note, I believe the legal threats made regarding my posession and release of this material are baseless. I do not expect any legal action to be taken. I certainly do not expect to be arrested like John Cook and the University of Queensland threatened.

I could be wrong though. Frivolous lawsuits do happen. Law enforcement agencies do sometimes persecute people without cause. At a number of people’s request, I’ve created a legal fund to address such concerns. You can donate to it, or to simply offer support against bullying designed to prevent people from examining data for Cook et al’s work, here:

Any money not needed for legal purposes will be spent according to the wishes of those who donate.

Advertisements

81 comments

  1. much kudos for actually having a set of balls brandon . lack of testicular fortitude is what got us in the mess in the first ,good to see some people have the ability along with the technical knowledge to fight back.

  2. 12332,2011,A Life Cycle Assessment Of Injectable Drug Primary Packaging: Comparing The Traditional Process In Glass Vials With The Closed Vial Technology (polymer Vials),International Journal Of Life Cycle Assessment,3,3,1683,3,3,,1,3,3,Study is about which method has lower emissions so is about mitigating emissions. Abstract talks about ‘impact is reduced…for global warming’ so implicit endorsement.,,,,,,,,,

    wtf ? and the potus endorses this crap .

    apologies for the previous post where i missed the word “place” after the word “first”,it has been a long weekend.

  3. bit chilly, thanks for the compliment, but I’m not sure I deserve it. I went through a lot to ensure there was no chance I could suffer any legal consequences. At this point, there’s basically no risk to me other than the possibility people will say bad things about me. It doesn’t take much courage to stand up when that’s the worst you can suffer.

  4. Brandon sez: ” It doesn’t take much courage to stand up when that’s the worst you can suffer.”

    That’s still more courage than many exhibit.

  5. The rating for the paper shown below is (IMO) incorrect. Also it appears that they have a database problem or a revision issue.

    ********************************************
    1991,Global Climate Change,Journal Of Engineering For Gas Turbines And Power-transactions Of The Asme,Hammerle| Rh; Shiller| Jw; Schwarz| Mj

    Global climate change
    This paper reviews the validity of the greenhouse warming theory, its possible impact on the automotive industry, and what could be done. Currently, there is very limited evidence that man’s activity has caused global warming. Mathematical models of the earth’s heat balance predict warming and associated climate changes, but their predictions have not been validated. Concern over possible warming has led to several evaluations of feasible CO{sub 2} control measures. Although cars and trucks contribute only a small fraction of the CO{sub 2} buildup, the automotive industry may be expected to reduce it share of the atmospheric CO{sub 2} loading if controls become necessary. Methods to reduce automotive CO{sub 2} emissions, including alternative fuels such as methanol, natural gas, and electricity, are discussed. Also, control of the other greenhouse gases, which may currently contribute about 45 percent of the greenhouse warming is considered.
    *********************************************

    The abstract states that :
    – Currently there is very limited evidence that man’s activity has caused global warming.
    – the heat balance models haven’t been validated
    – the auto industry might be expected to reduce its share of GHGs “if controls become necessary.”
    – 45% of the greenhouse warming might be from GHGs other than CO2.

    From SKS’s TCP page.
    Authors: Hammerle, Rh; Shiller, Jw; Schwarz, Mj (1991)
    Journal: Journal Of Engineering For Gas Turbines And Power-transactions Of The Asme
    Category: Mitigation
    Endorsement Level: 6. Explicitly minimizes/rejects AGW but does not quantify

    Also, it’s listed as #2 on the rejection link.
    2. Global Climate Change
    Hammerle, Rh; Shiller, Jw; Schwarz, Mj (1991)**
    ` ** Explicitly reject AGW

    How the heck does it rate an explicit rejection?? The first sentence questions the evidence but the other points make it clear that the authors don’t reject the theory and they even quantify the possible CO2 impact.

    In regards to the database problem, from the All Authors link in the data you released:

    Id, Name, TCP Bias,Papers,Attribution,Contribution
    5977, R. H. HAMMERLE, 2, 2, 0, 0
    Where
    TCP Bias: 1 = authored a rejection paper, 3 = authored an endorsement paper (but no rejection papers), 2 = authored only neutral papers
    Papers: Number of papers in TCP database

    How can Hammerle have a TCP bias of 2 if he has a paper that explicitly rejects the consensus?

  6. The whois for the Sekret domain indicates it was first created March 28, 2012. That was three days after John announced the SKS hack.

    Where was this data stored before then?

  7. interesting the no risk bit in the ethics approval..when multiple authors (ie Spencer, etc ) disputes how Cook has rated his paper, Cook et al (Dana, etc) publicly attacked them…!

    No risk to the scienific paper authors that were being ‘rated’ the request for ethics approval says

    big risk, attacked by Cook, Sks publicly in the international media!!

    an example from the Guardain: (SkS Dana) – extracts
    http://www.theguardian.com/environment/climate-consensus-97-per-cent/2014/feb/14/global-warming-consensus-stringer-cnn

    Similarly, contrarian climate scientist Roy Spencer claimed in Congressional testimony last year that he’s included in the 97 percent. (dana)

    “There’s a recent paper by John Cook and co-authors who looked at thousands of research papers which have been published in the scientific literature to see what fraction support the scientific consensus on global warming. Well, it turns out that the 97% consensus that they found, I am indeed part of and Senator Sessions mentioned he would agree with it too. And my associate John Christy, he agrees with it. In fact, all skeptics that I know of that work in that business. All are part of that 97% because that 97% includes those who think humans have some influence on climate. Well, that’s a fairly innocuous statement.” – Spencer

    These statements are all incorrect, ignoring a significant part of our research. They are based on one of the categories used in our study regarding “implicit endorsements” of human-caused global warming. A paper that was included in this category: (Dana)

    ####
    Note how Dana in the Guardian misrepresent Cook et al. Spencer would be included in Cat 2 or 3 !! which adds up to the 97%
    ############

    “Those like Spencer, and possibly Stringer and Kreutzer, who believe the human influence on the climate is minimal, hold fringe views that are consistent with just 2 to 3 percent of the peer-reviewed climate science literature.” – Dana

    Spencer had dared to say he was part of the scientific consensus …. disputing Cook et al.

    big ethical risk!!

  8. Richard Tol, no prob.

    Pouncer, that is true. It’s just seems like saying, “Thanks for not sucking too much” to me.

    DGH, I agree about that paper being mis-rated. As for the TCP Bias value, that column confused me. Some of the entries in it don’t seem to make sense. For example, why are some authors listed as having a negative bias? As for the data, I’d assume it was stored at Skeptical Science prior to the hack.

    Barry Woods, remember the claim there was no danger to the authors only applies to the self-ratings. The Skeptical Science group’s ratings were performed without any oversight. That means its okay for them to be used to harm individuals, I guess.

  9. Regarding Barry’s point regarding the media treatment of ‘not subject’ authors, i.e. authors who declined to self rate for them, it may be the case these ‘not subject’ authors have no ethics rights and their castigation by the TCP authors in the media therefore may not come under any actual ethics breach, but I think there is probably a rather subtle point to be made that casts the authors subsequent media treatment of these authors in a dubious light.

    It almost seems to me if you take part you are fine and protected but if not the authors say ‘we are free to castigate you no matter what you say’.

    Look at Nir Shaviv, they unquestioningly misrated his paper and Shaviv said they misrated his paper, yet because Shaviv made some IMO rather vague remarks about wording used in the paper, the TCP author dismiss him almost as a liar. Yet they misrated his paper! They almost physically cannot admit mistakes and rather make assertions about the ‘non subject’ author as if it his fault and therefore deserves not to be trusted to be understood.

    In my opinion it seems the irony of their use of their TCP paper as a tool to deligitimise the likes of Spencer is that if Spencer *had* volunteered to respond (which seems certain he didn’t) and then had contradicted their category assessment then surely they would be honour bound not to contradict him in any way afterwards?

    i.e. hypothetically, if Spencer took part and self-rated he would be assumed to have answered honestly using his full knowledge of the paper whereas the TCP raters and authors can only rely on inferences from the abstracts.

    AFAIK There is no discussion in the papers methodology or analysis that considers the possibility of self rating authors lying to them.

    So whenever the TCP authors indirectly use their paper as a means to reflect back on ‘non subject’ authors, implying literal out-cast status, using their media platforms like they have, this seems incredibly damning and indicative of their real PR motives.

    There is so much about this paper at this basic level that seems so risible and obviously the very antithesis of science that I keep shaking my head in wonder that it actually gets taken seriously at all.

  10. I saw those -1s and some other fishy numbers. I assume they were flags of some sort for Cook’s analysis.

    “As for the data, I’d assume it was stored at Skeptical Science prior to the hack.”

    Agreed. Although it’s a bit of a surprise that the hacker didn’t find and expose that data, too.

  11. tlitb1, considering bias amongst the authors who responded is way beyond anything they did. They didn’t even address the possibility of bias in who they contacted. For all anyone can tell, they might have not tried very hard to find e-mail addresses for people whose views they disliked yet tried hard to find e-mail addresses for the others. We can’t tell because their method for collecting e-mail addresses was completely unverifiable, and they haven’t released a list of e-mail addresses they used so their sample could be tested for representativeness.

    Heck, we already know the raters could “cheat” when doing ratings by looking up abstracts to see who wrote the corresponding papers. It was considered okay. Andy Skuce openly admitted to doing it in the forum, and everybody acted like it was perfectly normal. If it was okay with them to cheat when rating the abstracts, how can we trust they tried equally hard to collect e-mail addresses from everybody? When collecting those addresses, they could see the authors’ abstracts. That meant they could have easily been selective about which they tried to collect and how hard (even if only subconsciously).

    DGH, I’m not that surprised. That hacker was not good at what he did.

  12. Brandon responds: “Pouncer, that is true. It’s just seems like saying, ‘Thanks for not sucking too much’ to me.”

    I apologize for my ambiguity. I intended to praise you as exceptional.

    Many are cold, but few are frozen. I think you’re doing good careful work, setting a benchmark for others to aspire toward, and I think it makes you pretty cool.

    To change the topic entirely — you have commented on the Mann/Steyn libel lawsuit. Do you have any opinions or comparisons to offer on the libel action between Jesse Ventura and (the widow of) Chris Kyle?

    Ventura is, of course, a public figure both as a entertainer (for those who find pro wrestling an “entertainment”) and a politician (for those who find Minnesotans’ methods of choosing governors like Ventura and senators like Franken to be a variety of “politics”). Kyle’s book and related interviews came out in 2012, IIRC correctly at roughly the same time Simberg and Steyn commented about Michael Mann. The Ventura/Kyle lawsuit is in front of a jury now. Steyn’s recent comments seem to indicate it may be years before his own suit is similarly situated. Are there, in your opinion, enough similarities between Ventura and Mann to base any predictions upon?

  13. Pouncer, I understood your compliment. I just don’t think I deserve much credit for any of this. I don’t think I’m setting some high standard for people to aspire to. I think I’m barely doing more than the minimum anyone should be expected to do. Maybe I just have high standards.

    As for the Ventura case, I can’t comment at the moment. I hadn’t even heard of it before you brought it up. I’ll have to read up on it. I’m not going to do that right now though. I’ve barely commented today because I’m angry about some things I found when looking into the BEST results. I don’t like to write when angry, and I can’t write about a different topic while upset about this one.

    Steve McIntyre, I should have written about that issue in this post. The differences arise from the reconciliation process (where raters could examine their disagreements and make changes). The file at Skeptical Science shows pre-reconciliation ratings. The file you linked to from my mirror shows post-reconciliation ratings. You can verify this by examining a number of the other files, such as this one:

    http://www.hi-izuru.org/mirror/files/Ratings.htm

    That file shows the pre-reconciliation and post-reconciliation ratings for each abstract on a single line, along with any tie-break ratings. That means we have ratings at each stage (for 11,944 of the 12,465 papers):

    1) Pre-reconciliation
    2) Post-reconciliation
    3) Tie-break
    4) Final

    The problem is the data isn’t presented in a coherent manner, and it uses the same labels for different sets of ratings. Personally, I wouldn’t use either of the files you linked to unless absolutely necessary. The one I linked to is much clearer. All you have to do is add in the date values from the other files. That’s pretty easy to do. Just add date columns and use the User and Article ID values to match up the data.

  14. Got it.

    By the way, given that the original datafiles used userIDs rather than names, it’s amazing to consider Cook’s long withholding of data from Tol claiming that he needed to “meticulously anonymize” it, then refusing to provide even user IDs after prevaricating for weeks.

  15. @Brandon Shollenberger:

    dghblogging, I’m afraid I don’t know what point you’re trying to make. Could you explain?

    I wondered too. I think the brevity of dghbloggings comment helped me conjure up all sorts of inferences. E.g. does he imply there was some sort of ethics coverage of the raters because the word “anonymized” is used in conjunction with the word “raters” in a sentence in the actual paper?

    Knowing the importance of context I followed dghblogging’s suggestion to read that sentence in the paper and I found this (my emphasis):

    Each abstract was categorized by two independent, anonymized raters. A team of 12 individuals completed 97.4% (23 061) of the ratings; an additional 12 contributed the remaining 2.6% (607). Initially, 27% of category ratings and 33% of endorsement ratings disagreed. Raters were then allowed to compare and justify or update their rating through the web system, while maintaining anonymity.

    Seems from this IMHO rather woolly passage that the importance of anonymity is only used to imply some sort of care was taken to ensure rater independence from each other. IMO It implies that raters possibly influencing each other was a problem to guard against and anonymity was just an aid for this.

    Maybe there was something else dghblogging was implying?

    Raters were then allowed to compare and justify or update their rating through the web system, while maintaining anonymity.

    This is interesting. Is this fully true? I don’t doubt their was one data base “web system” that allowed raters to see paper ratings and comments anonymously, but don’t we see another “web system” i.e. the forum allowing just the opposite? Raters addressing each other by name and discussing individual papers and their own ratings?

  16. Mmmm, the “my emphasis” in the paragraph above didn’t turn out for some reason. I meant to bold both the sentences which used a word with ‘anonym..’ prefix.

  17. What dgh’s saying is the raters were anonymized to each other, i.e., independent.

    Yes , this is now how happened in practice and they discussed abstracts in the forum, but the system devised to capture volunteer ratings presented abstracts without disclosing who the second person rating a given abstract was, and what rating he or she had given.

    It must be remembered that though it appears they discussed abstracts, there were over ten thousand abstracts so the quantitative impact of volunteers biasing each other would be minimal.

    Only with this release is it possible to bring together an abstract, its raters, the first and second rating, reconciliation rating (if any), the order of ratings, and the final rating.

    This data could have been made available on day one. Instead Cook and co-authors managed to convince the journal editor-in-chief and university officials that the paper’s data need not be made available and users can do their own ratings if they wanted to analyse the data. They created a little front-end to give this appearance. In other words, the authors tried to dictate the type of replication and analysis that could be allowed to be performed on their data.

  18. What dgh’s saying is the raters were anonymized to each other, i.e., independent.

    Yes , this is now how happened in practice and they discussed abstracts in the forum, but the system devised to capture volunteer ratings presented abstracts without disclosing who the second person rating a given abstract was, and what rating he or she had given.

    It must be remembered that though it appears they discussed abstracts, there were over ten thousand abstracts so the quantitative impact of volunteers biasing each other would be minimal.

    Sure I can see that was dghblogging bare information there was that a system of allowing raters to get IDs and work through the other “web system” anonymously existed but dghblogging’s sparse mention of that, together with the link, prompted me to remind myself that the paper implies a claim that rater interdependence was guarded against during the study.

    Sure it is true the evidence of individual Rater A biasing individual Rater B is very small in the forum, however the evidence that the majority of the bulk raters, even if they did not enter comments, had access to the forum “web system”, and that “web system” broadcasted regular discussion pieces between raters, and more importantly had regular recommendations from the lead author on approaches, rather undermines the value of this projection of care IMO.

    http://www.hi-izuru.org/forum/The%20Consensus%20Project/

    It’s a bit like the producers of Big Brother when advertising the spontaneity of participant interaction telling us they took scrupulous care to maintain privacy when each subject went in the Big Brother booth to discuss their thoughts, but then omit to mention all the subjects later gathered each teatime to see the broadcasted edited highlights of the discussions! 😉

    For me this façade of protecting anonymity is just further example of projecting pseudo-rigour after the fact.

    If you see the forum discussions which very often included John Cook’s participation you see regularly emphasised the speed of rating turnover is far more emphasised than accuracy. The emphasis is to get the numbers up (remember no fatigue folks 😉 ) and literally a recommendation from Cook ( saying it was his own strategy)

    adopt the “if in doubt, rate neutral” rule

    http://www.hi-izuru.org/forum/The%20Consensus%20Project/2012-03-07-A%20question%20of%20bias.html

    Then in the final paper you see this (my emphasis, hopefully):

    Two sources of rating bias can be cited: first, given that the raters themselves endorsed the scientific consensus on AGW, they may have been more likely to classify papers as sharing that endorsement. Second, scientific reticence (Hansen 2007) or ‘erring on the side of least drama’ (ESLD; Brysse et al 2012) may have exerted an opposite effect by biasing raters towards a ‘no position’ classification.

    So Cook literally says in the forum just get on with rating sheds loads of papers, and if in doubt bung it in the neutral bin – just get the numbers up – then later on when he gets the scientist self-ratings back – lo and behold! he observes the forum ratings “underestimates the percentage of papers taking a position on AGW” just like Naomi Oreskes warns in the ESLD paper!

    However he clearly ignores the inevitable possibility of the self rating scientists pulling their papers out of the “if in doubt” neutral pool into consensus space because of his private (up to now) recommendations in the forum to be lax.

    And this is ignoring the hubris of the implication that the humble reticence of a bunch of activists bloggers to correctly rate papers is down to ESLD when the ESLD discussion was regarding working climate scientists and not “citizen scientist” on a mission 😉

    Hay ho, as I keep saying, I just see my self as a layman here and have a little laugh everytime a great politician or fellow scientist of Cook’s calibre quotes this dreckola as a piece of science 🙂

  19. “…this façade of protecting anonymity is just further example of projecting pseudo-rigour after the fact.”

    Absolutely. Their claims of rigour from anonymity are bogus. Just that the quantitative impact of the lack of any actual rigour is likely less than can be expected.

  20. Brandon,

    My comment was made from an iPad in the wee hours. Apologies for the brevity.

    You wrote, “It’s important to remember John Cook never said he planned to anonymize the rater ID information. His talk of anonymizing data was in reference to the self-ratings.”

    Per my link and your earlier post, My Super Secret Confidential Data, John was never very clear about who he intended and was required to keep anonymous. In the paper he wrote that the raters were “anonymized.”

    DGH

  21. Latest finding:
    Cook and co rated abstract for three months, had a month’s pause, and rated for another month. The distribution of ratings in the final month is significantly different (p<0.01) from the rest.

    A cynical observer may deduce that they looked at the results, did not like them, and went back to get more data.

  22. DGH, I agree. I was referring to what Steve McIntyre and Richard Tol were talking about though. They were talking about John Cook’s excuse for delaying the release of data. That was all about the self-rating data.

    Richard Tol, a more sensible observer would likely deduce tie-break ratings were done in a different stage and recognize tie-break ratings would not be expected to have the same distribution as other ratings.

  23. Richard,

    The paper explains,

    “Abstracts were randomly distributed via a web-based system to raters with only the title and abstract visible. All other information such as author names and affiliations, journal and publishing date were hidden. Each abstract was categorized by two independent, anonymized raters. A team of 12 individuals completed 97.4% (23 061) of the ratings; an additional 12 contributed the remaining 2.6% (607). Initially, 27% of category atings and 33% of endorsement ratings disagreed. Raters were then allowed to compare and justify or update their rating through the web system, while maintaining anonymity. Following this, 11% of category ratings and 16% of endorsement ratings disagreed; these were then resolved by a third party.”

    Isn’t it fair to allow them to resolve rating discrepancies given that two people looked at each paper initially? Is the difference in the distribution reasonable with that in mind and without being cynical?

    DGH

  24. Brandon, I’m relatively fresh to this dispute, but I’ve now collated and closely examined all of the FOI correspondence and do not agree with your interpretation (if I am correctly attributing) that Cook was talking about the self-rating data in most of the relevant correspondence. The self-ratings data was placed online on July 8, but Cook continued to refuse Tol’s request for SKS rater information claiming that he had to anonymize. Cook’s excuses from July 8 on cannot be saved.

    In perusing some of the titles, I’m amazed at how little independent support they provide for the “consensus” – as opposed to merely presuming the problem – a point that others have already made quite forcefully. Lindzen, among others, has long observed that a large academic industry has been built up on the presumption of a major problem. It’s hard to see anything in the Cook data that contradicts Lindzen.

    When I first got interested in this field, I was struck by the many genuflections to “global warming” in many articles that had only a superficial connection to the topic. It struck me that it would be rational for academics seeking research funding to include such genuflections, given the more ample funding for such research, rather like municipalities obtaining funding for ordinary municipal works by genuflections to anti-terrorism.

  25. It is on record that Cook refused to part with volunteer ratings quoting reasons of anonymity, several times.

    A small question: Is the meaning of ‘OrigEndorse’ in the data set clear?

  26. @Shub Niggurath

    Absolutely. Their claims of rigour from anonymity are bogus. Just that the quantitative impact of the lack of any actual rigour is likely less than can be expected.

    Yes, just to be clear I am not one to say there is some ‘real’ result being hidden by their shenanigans here. I personally don’t doubt it changed a whit their ability to gather enough “Sports Surface Technology” papers and such like to make up that required final 97% percentage break of the “expressed a reference” proportion 😉

    In fact I am almost impressed how Cook et al seem to have correctly assessed the current moribund nature of the field of climate social science. They have to some extent succeeded in strategically spotting a way to get to *own* the paramount gold standard 97% study. They knew they could mobilize their groups dedication to rack up the thousands of numbers that was the only requirement needed to extend humanities knowledge of the field!

    The trouble is I think the cargo-cult science raffia work dressing and filigree added on top afterwards can only begin to fray and look worse with time 🙂

  27. Steve McIntyre, I’m not aware of any case where John Cook refused to release “SKS rater information claiming that he had to anonymize.” If there were any, I’d agree he was wrong in them. Cook never intended to release the SKS rater information.

    To my knowledge, Cook has been consistent in saying that data cannot be released because it would be impossible to anonymize. That’s true. Because of information published in the Skeptical Science forum, including several visual aids created by Cook himself, it is impossible to anonymize Rater ID#s.

    A number of raters would be identifiable just by counting the total number of ratings performed by each Rater ID#. With datestamps, even more raters would be identifiable. If datestamps and (anonymized) Rater ID#s were released, people could have easily identified half of the raters despite the “anonymization.”

    That said, the fault lies with Cook himself. He never should have people see how many ratings other people had done. He shouldn’t have allowed them to discuss those numbers in a forum. He certainly shouldn’t have created visual aids to show those numbers (then post those visual aids in a publicly accessible location). The fact Cook failed to protect “confidential” data at one stage does not justify him later refusing to release data that ought to have been publicly accessible.

    This is especially true since the information necessary to identify raters despite any attempt at anonymization was exposed to the people who had access to the Skeptical Science forum. From what I can tell, that’s something like 200 people. Cook did not elicit promises of confidentiality from those ~200 people, meaning the data was never kept confidential. The fact the Skeptical Science forum was hacked has nothing to do with it. Confidentiality was broken from the very beginning.

  28. Richard Tol, you say:

    Brandon: I should have noted that I refer to the initial ratings.

    But it is clear you included tie-break ratings. We know when they started doing the tie-break ratings.

  29. Steve,

    In the wake of TCP someone noted that Cook’s is a consensus without an object. Here, here.

    Cook recognized the issue that you raise and addressed it in the paper as follows:

    “This explanation is also consistent with a description of consensus as a ‘spiral trajectory’ in which ‘initially intense contestation generates rapid settlement and induces a spiral of new questions’ (Shwed and Bearman 2010); the fundamental science of AGW is no longer controversial among the publishing science community and the remaining debate in the field has moved to other topics. This is supported by the fact that more than half of the self-rated endorsement papers did not express a position on AGW in their abstracts.”

    But then it doesn’t make sense to do a survey over a 21 year period, does it? If consensus is a moving target then the definition of consensus has to be different in 1991 than it is in 2011.

    The example of explicit endorsement that Cook provides illustrates the point,

    “‘The global warming during the 20th century is caused mainly by increasing greenhouse gas concentration especially since the late 1980s’”

    The prevailing opinion on that point within the scientific community must have evolved since 1991 only 2 years after the “late 1980s”.

    Another issue with his explanation is the self-ratings. In 2011 there were 228 scientists who self endorsed the consensus while only 151 scientists put themselves in the neutral category. By those numbers the majority of scientists are currently engaged in confirming the consensus as opposed to working on the debatable topics.

    Cook might argue that all of the science is settled and that there are fewer issues to debate. Then I’d suggest that funding be slowed accordingly.

  30. @Brandon
    No. I did not look at the tie-breaks.

    Cook’s release has the ratings in chronological order. There as well the final ratings deviate from the rest. We now know that there was a pause in rating.

  31. Richard, did you examine the Rating1 and OrigRating1 entities? It is not clear what the ‘Orig’ ratings are. For eg, OrigRating1 is different from Rating1 in ~8%, a substantial number.

  32. Richard Tol, you did use the tie-break ratings. Here is a graph of the number of ratings done by day. We can see there are two distinct rating periods. However, if we look at the data file used to make it, we can see there are 26,848 entries. There is only data for 11,944 papers. Two ratings for 11,944 papers only adds up to 23,888. That means the file has ~3,000 more ratings than we’d expect if there were no tie-breaks.

    We can also confirm this by comparing the ratings in the second rating period to data files which clearly label the tie-break ratings. If we do, we can see the Article #s and Rater ID#s of the second period match up with the tie-break ratings.

    It is indisputable the second rating period you refer to was for tie-breaks.

  33. Brandon: Cook et al. write ” In March 2012, we searched the ISI Web of Science for papers published from 1991–2011 using topic searches for ‘global warming’ or ‘global climate change’. Article type was restricted to ‘article’, excluding books, discussions, proceedings papers and other document types. The search was updated in May 2012 with papers added to the Web of Science up to that date.”

  34. Richard,

    The Sekret Forum provides a closer look into the rating process. Here are a couple of relevant links:

    http://www.hi-izuru.org/forum/The%20Consensus%20Project/2012-03-12-Plan%20for%20updating%20the%20database%20of%20papers.html

    http://www.hi-izuru.org/forum/The%20Consensus%20Project/2012-03-18-Looking%20forward%20to%20the%20Quality%20Control%20stage.html

    By late March 2012 the Kidz were nearly through the first round of ratings. JC indicates that he will append new papers (about 150 at that point if my memory serves) from the ISI database as late in the process as possible. Yes, it seems there were some first time ratings in the latter portion of the process but not many.

  35. Richard Tol, that there are a small number of new ratings in the second set of ratings does not mean all the ratings are new. As I explained, if you look at the data files, it is indisputable the second set contains tie-break ratings.

  36. I thought about the sequence, with the available information. Each abstract got ratings from two people. When this was complete, raters were allowed to go back to those abstracts where there was a mismatch and give a second rating. Where there continued to be a mismatch, a so- called tiebreak rating was given by a third rater. That’s how you have orig endorse, endorse, tiebreak, final.

    The second peak may be a mix of raters going back and tiebreak ones. Let me confirm.

  37. Shub Niggurath, when raters went back and changed their ratings, it was not stored as a new entry. It was stored as a new column. The second set of ratings were almost entirely tie break ratings.

  38. Brandon, you remember the data file you and dgh, brought together? It has two sets – rating 1st and 2st, corresponding to the two first ratings given to every abstract. The disagreement rate between the two is ~33% – i.e., just like Cook mentions in his paper. The data above has the two first ratings given – OrigEndorse1 and OrigEndorse2. The disagreement rate between the two is again ~33%.

    But rating 1 and OrigEndorse1 are not the same. Their ratings are different for 800+ abstracts. Why? Any help would be great.

  39. Brandon, you said ” I’m not aware of any case where John Cook refused to release “SKS rater information claiming that he had to anonymize.” Have you looked at the FOI correspondence? This shows Tol being told the opposite.

    For example, on July 7, Hoegh-Guldberg, to whom Tol had appealed for assistance, told Tol (who had asked for SKS ratings):

    “I have spoken to John who tells me he is preparing requested data for release. He is currently in the middle of the international conference season (seven meetings over two months) and hence is not at his desk and able to do this immediately. The problem is that the data is requested requires anonymousing [sic] in order to preserve the privacy of the survey participants and the volunteer raters.”

    Tol then had considerable further correspondence with Hoegh-Guldberg, DVC Max Lu and the journal, in which he observed that anonymizing rater ID would be a trivial undertaking and even offering to do so himself under a confidentiality agreement if Cook didn’t have the time. There are multiple emails from University officials reassuring Tol that Cook would deliver up the data when he had time on return from Europe.

    Most of these assurances came from the University rather than Cook himself, but the University’s assurances were based on briefings from Cook and Cook was copied on many of the emails without objecting.

    I agree that Cook himself might never have had any intention of ever delivering the ratings data, but, based on my review of the documents – and I’ve done so carefully – Tol was told the opposite.

    When Cook didnt provide the ratings data, the University changed its story, now claiming that Cook’s refusal to deliver SKS ratings data (author self-rating was not disputed at this time) was due to confidentiality restrictions arising from the ethics approval for the SKS ratings. But as we’ve learned, there was no ethics application for the SKS ratings – this was another fabrication.

    I really don’t see any way that the University’s statements can be interpreted as technically correct. In my opinion, they were untrue, deceptive and/or dishonest.

  40. Steve McIntyre, I have read the FOI correspondence. Several times. As far as I can tell, it does nothing to contradict what I said. I cannot find a single instance of John Cook claiming he needed time in order to anonymize the SKS rater data. The one comment you’ve offered does not do so. This:

    The problem is that the data is requested requires anonymousing [sic] in order to preserve the privacy of the survey participants and the volunteer raters.

    Is obviously wrong. However, there is no indication this was “deceptive and/or dishonest.” It would be easy for someone to slip up and refer to keeping Rater IDs anonymous while discussing the need to anonymize author IDS. It’s not reasonable to take this, which could easily have been a careless mistake by a third party, as indicating John Cook misled Richard Tol.

    Even if we move away from Cook, I don’t see how it’s reasonable to criticize the university as “deceptive and/or dishonest.” You only quoted one comment so it’s difficult to follow the logic of your argument. From what I see, you’re simply wrong in your depiction of what Tol was told For example, On July 26th, Max Lu told Tol:

    the other data would not be required to produce the methods. The author intends to release the data as stipulated by Environmental Research Letters. However, as he has advised you earlier, there will be a short delay as properly anonymizing the data takes time and your request came at an intensely busy conference period for him.

    There is nothing wrong with that statement. The data ERL said should be published was clearly outlined for Tol (it’s discussed previous to what I quoted). That data stipulated by ERL did not cover Rater IDs. There was nothing in this statement which would indicate to Tol Rater IDs would be released. Any reasonable reading of the e-mail makes it clear Rater IDs would not be released.

    Another e-mail from later in that day said:

    As I have been advised, the conditions of the ehtics approval for the study meant that some data cannot be simply released as you suggested, without breaching the law in Australia.

    It does nothing to say Rater IDs were covered by an ethics approval. It does nothing to say Rater IDs would ever be released. It is truthful and honest. I can pull out everything Tol was told, but the same pattern holds.

    Rather than dwell on that, we can instead look at what John Cook said about Rater IDs. On July 30th, he told people at his university:

    I was planning on putting the individual ratings online – but not with rater IDs. I asked ERL what we should give Tol and they said I didn’t need to give them that. So he won’t be able to connect any single rating with a single person. if he didn’t do it, plenty of other deniers on the web would.

    Here, Cook makes it perfectly clear he has no intention of releasing Rater IDs. He made this point clear in direct communciation with Tol on August 16th, in an e-mail to Tol which said:

    4. This data would potentially reveal the identity of individual raters (given that private correspondence of the raters had been stolen and published online)

    This is a clear denial of Tol’s request for Rater IDs, pointing out the impossibility of anonymizing them. The worst you can say is Cook implied there was a need to keep Rater IDs confidential when there was no ethical obligation. I’m not sure that’s true, but regardless, it’s a far cry from what you’re claiming.

    Put simply, I’ve read through this material a number of times, and I can’t find a single thing to support what you say. The argument you’ve posted rests upon highlighting a single mistatement and a lengthy discussion in the form of, “Have you looked at X?” There is nothing compelling about that.

  41. On a side note, the fact the anonymized self-ratings were published on July 8th does not resolve as much as people might think. On July 21st, Richard Tol specifically requested “author ratings (N=2142).” Presumably, he was unaware Cook had already published 2,136 of those ratings. If so, it is hardly unreasonable that University of Queensland representatives could have also been unaware. Cook would have obviously been aware, but as far as I’ve seen, he never claimed to need time for it after that point. I’d assume the reason for a delay between those 2,136 ratings being published and the other data being published was just Cook taking time getting around to addressing Tol’s request. That may have been rude and unhelpful, but not deceptive or dishonest.

    Shub Niggurath, one file’s “original” ratings are the pre-reconciliation ratings. The other file’s “original” ratings are the post-reconciliation ratings. If I understand your confusion correctly, that should resolve the differences you’re seeing. If not, I can post a more full mapping of the data later today. Right now I have to see about running some errands before ten when a housemate leaves for the weekend.

  42. Brandon, firstly, thanks for looking into the data. It appears you understand the unreleased data better so it would be great to see it lined up.

    Second, your point is that “no dishonesty” is involved in the University’s response. This is barely sustainable. This is particularly not supportable using Cook’s words. Cook has provided *all* possible rationalizations for not releasing data to Tol and the outside world with the effect that some of these reasons contradict one another. Keeping things straight is not Mr. Cook’s forte.

  43. Shub Niggurath, I may combine those few data files into a single file and upload that. If nothing else, at least then it could have informative column names.

    As for whether or not the University of Queensland was deceptive in its responses, I don’t know of any case where John Cook or anyone else there intentionally said Rater IDs would ever be released. As far as I can tell, they’ve been consistent in saying that information would not be released because the journal (ERL) said it didn’t need to be. In that regard, I cannot see any dishonesty.

    That doesn’t mean there was no dishonesty. There certainly could have been dishonesty on other issues. I’ve flat-out called John Cook a liar for things he’s said about his data. I think the University of Queensland was dishonest in the threatening letter they sent me. I think there was other dishonesty as well.

    I just don’t see anything to indicate Richard Tol was misled into believing Rater IDs would ever be released. It seems to me people were pretty consistent in saying it wouldn’t be.

  44. Brandon,
    I don’t have time right now to parse everything in the FOI dossier. I’ve looked closely at your points and, on some issues, I think that there are factual points (not in order below) that ought to be reconciled before attempting interpretation.

    You said:

    On July 30th, he told people at his university:

    I was planning on putting the individual ratings online – but not with rater IDs. I asked ERL what we should give Tol and they said I didn’t need to give them that. So he won’t be able to connect any single rating with a single person. if he didn’t do it, plenty of other deniers on the web would.

    Here, Cook makes it perfectly clear he has no intention of releasing Rater IDs. He made this point clear in direct communciation with Tol on August 16th, in an e-mail to Tol which said:

    4. This data would potentially reveal the identity of individual raters (given that private correspondence of the raters had been stolen and published online)

    This is a clear denial of Tol’s request for Rater IDs, pointing out the impossibility of anonymizing them.

    That Cook denied Tol’s request for rater IDs on August 16 is not at issue. His August 16 made it very clear that he had no intention of releasing rater IDs. After that date, the issue is whether his reason (a supposed requirement under ethics approval) held up. Let’s leave that discussion for another occasion and focus for now on the record up to August 16.

    You say that on July 30 Cook told people “at his university” of this intent not to provide rater IDs. As I read the dossier, we don’t KNOW that the recipients of his July 30 email were at his university and indeed there are reasons to think otherwise. The University expurgated the identity of the recipients of the July 30 email. Since they didn’t expurgate the identities of University officials Hoegh-Guldberg, Lu, Lawson etc, I take this as evidence that the recipients of the July 30 email were not at the University (presumably they were SKS associates).

    Second, it seems evident to me that, in their July and early August, Hoegh-Guldberg and Lu both assumed that Cook was going to provide SKS rater IDs and that he needed time to anonymise names to rater IDs. In my opinion, this is evidenced in multiple emails to Tol. I didn’t cite all the comments only for time and energy reasons. You agreed that Hoegh-Guldberg’s email of July 7 was wrong on this point, but there are others. I believe that the erroneous statements/impressions in the emails from Hoegh-Guldberg and Lu arose from their misunderstanding of Cook’s intentions. However, Cook was copied on all or most of this correspondence and knew that his superiors had mis-spoke. In a business situation, if I or someone else wrote an email to a client or customer with an inadvertent error and copied the email to an employee who knew that it contained an error, I would expect the employee to immediately notify me of the mistake so that I could correct it. Most business people that I know would be apoplectic if the employee simply lay in the weeds.

    So yes, I agree that Hoegh-Guldberg and Lu’s false statements to Tol were unintentional on their part, but they were nonetheless deceptive. In using the term “deceptive”, I do not imply that Hoegh-Guldberg and Lu were being intentionally deceptive. On the contrary, but the statements were still deceptive. Cook ought to have notified HG and Lu that they mis-spoke, but he seems not to have done so.

    There’s also an important issue in respect to the ERL “agreement”. All correspondence relating to this agreement has been expurgated, so we don’t KNOW what it says. Cook’s language in his August 16 letter is careful. While he talks about ERL directions in the case of the author self-rating data, he uses a passive voice in connection with the SKS ratings: reading quickly, one assumes that this was also directed by ERL, but he doesn’t actually say so.

    In addition, prior to August 16, there is no direct statement from Cook that he plans to withhold rater IDs. Previously Cook told his supervisors that he plans to release data in accordance with his ERL agreement, but no one besides him knew what that agreement said. If Cook wanted people to understand his intent, he would have clearly spelled out in the earlier correspondence that the ERL agreement entitled him (so Cook claims) to withhold rater ID. But he didn’t do so until August 16. In the context, it seems to me that HG, Lu and Tol all presumed that the ERL agreement included rater IDs, which took time to anonymize.

    You said in connection with Lu’s email:

    The data ERL said should be published was clearly outlined for Tol (it’s discussed previous to what I quoted). That data stipulated by ERL did not cover Rater IDs. There was nothing in this statement which would indicate to Tol Rater IDs would be released. Any reasonable reading of the e-mail makes it clear Rater IDs would not be released.

    I disagree. Lu’s letter did not “clearly outline for Tol” the data that ERL should be published. You say that a “reasonable reading of the e-mail makes it clear Rater IDs would not be released”. How so? The letter doesn’t say that. Tol had been told that Cook needed to anonymize rater information and was doing so.

    You quoted from another e-mail from later in that day said:

    As I have been advised, the conditions of the ehtics approval for the study meant that some data cannot be simply released as you suggested, without breaching the law in Australia.

    But that quote continued “Some work and time to anonymise before release are necessary”.

    Parsing statements about ethics approval is an interesting but separate task. Because so much correspondence has been expurgated, it takes a while to understand these documents.

    BTW in all the years of Cook is one of the very few people that I have called a baldfaced liar (in connection with Lewandowsky and SKS). Because of his previous record of lying, I do not rely on his word.

  45. Steve McIntyre, I think I’ll keep this brief since your new post is bound to attract more discussion. To answer your question:

    I disagree. Lu’s letter did not “clearly outline for Tol” the data that ERL should be published. You say that a “reasonable reading of the e-mail makes it clear Rater IDs would not be released”. How so? The letter doesn’t say that. Tol had been told that Cook needed to anonymize rater information and was doing so.

    This is a simple issue to resolve as reading the e-mail shows it does exactly what I said. We can see this by looking prior to the portion I quoted, as I stated one should to verify it:

    The author has consulted Environmental Research Letters (ERL) and was advised that data on the first and second ratings, and on the randomly selected 1000 “no position” abstracts, would be sufficient to reproduce the methods, but the other data would not be required to reproduce the methods. The author intends to release the data as stipulated by Environmental Research Letters.

    The data outlined here is the exact same data John Cook says ERL told him should be published. The e-mail clearly outlines what data ERL said should be published, and it does so in the exact spot I pointed to.

    That said, it is fair to criticize Max Lu for being unclear here. Lu fails to note ERL did not rule one way or the other in regard to the self-rating data. Its decision was self-rating data should not be released unless it could be anonymized. Lu failed to explain that, and that made his line:

    However, as he has advised you earlier, there will be a short delay as properly anonymising the data takes time

    Confusing. That could have been fixed by simply adding, “anonymized self-rating data” to his list of what would be released. I assume that was an inadvertent lack of clarity. I’ve seen plenty of things as bad, and worse, in e-mails explaining situations. As for your remark:

    But that quote continued “Some work and time to anonymise before release are necessary”.

    Of course it did! Richard Tol had requested several things, including the (full) self-ratings data set. He was told part of his request would be met. He expressed dissatisfaction. He was then told some of what he had requested could not “be simply released as [he] suggested.” That was completely correct.

    I don’t know why you’re adding part of a quote which correctly pointed out Tol was requesting data which couldn’t be released without being anonymized. The part you added was correct, and it does nothing to contradict what I said.

    I get looking for and typing up quotes can be a chore, but it is necessary if you want people to follow the arguments you’re making. I don’t see how the e-mails support what you say, and you seem to be describing at least some of them incorrectly.

  46. Any updates on the data?

    I can post pair-wise comparisons between columns in the combined data set, i.e., the data from Cook’s earlier release and the presently available version. Because there are two, each column should find at least two 100% matches, one with itself and the other with previously available data. There is only one 100% match, i.e., each column is represented only once. It is not clear to me what is wrong.

  47. Here are matches between this data set which Cook released earlier, and the data which has rater ids. OrigEndorse1, OrigEndorse2, Endorse1 and Endorse2 come from the latter set. ‘rat1, rat2, rat1.x and rat2.x’ are from the previously released data. (they are not rats, they are ratings).

    The first set released by Cook is here: http://www.skepticalscience.com/docs/tcp_allratings.txt
    The second set mirrored here: http://www.hi-izuru.org/mirror/files/Ratings.htm

    As can be seen, there is only one 100% match, ie, a rating with itself. There should be two.

    I would like to know what I’m getting wrong.

    OrigEndorse1 vs OrigEndorse1 100
    OrigEndorse1 vs Endorse1 92.5
    OrigEndorse1 vs rat1 93.3
    OrigEndorse1 vs rat2 89.6
    OrigEndorse1 vs OrigEndorse2 67.5
    OrigEndorse1 vs Endorse2 77.6
    OrigEndorse1 vs rat1.x 74.2
    OrigEndorse1 vs rat2.x 80.5
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    Endorse1 vs OrigEndorse1 92.5
    Endorse1 vs Endorse1 100
    Endorse1 vs rat1 85.8
    Endorse1 vs rat2 97.1
    Endorse1 vs OrigEndorse2 70.9
    Endorse1 vs Endorse2 84.4
    Endorse1 vs rat1.x 77.5
    Endorse1 vs rat2.x 87.3
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    rat1 vs OrigEndorse1 93.3
    rat1 vs Endorse1 85.8
    rat1 vs rat1 100
    rat1 vs rat2 88.5
    rat1 vs OrigEndorse2 74.2
    rat1 vs Endorse2 76.5
    rat1 vs rat1.x 67.5
    rat1 vs rat2.x 73.9
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    rat2 vs OrigEndorse1 89.6
    rat2 vs Endorse1 97.1
    rat2 vs rat1 88.5
    rat2 vs rat2 100
    rat2 vs OrigEndorse2 73.5
    rat2 vs Endorse2 87.3
    rat2 vs rat1.x 74.6
    rat2 vs rat2.x 84.4
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    OrigEndorse2 vs OrigEndorse1 67.5
    OrigEndorse2 vs Endorse1 70.9
    OrigEndorse2 vs rat1 74.2
    OrigEndorse2 vs rat2 73.5
    OrigEndorse2 vs OrigEndorse2 100
    OrigEndorse2 vs Endorse2 85.3
    OrigEndorse2 vs rat1.x 93.2
    OrigEndorse2 vs rat2.x 82.6
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    Endorse2 vs OrigEndorse1 77.6
    Endorse2 vs Endorse1 84.4
    Endorse2 vs rat1 76.5
    Endorse2 vs rat2 87.3
    Endorse2 vs OrigEndorse2 85.3
    Endorse2 vs Endorse2 100
    Endorse2 vs rat1.x 86.4
    Endorse2 vs rat2.x 97.1
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    rat1.x vs OrigEndorse1 74.2
    rat1.x vs Endorse1 77.5
    rat1.x vs rat1 67.5
    rat1.x vs rat2 74.6
    rat1.x vs OrigEndorse2 93.2
    rat1.x vs Endorse2 86.4
    rat1.x vs rat1.x 100
    rat1.x vs rat2.x 89.3
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
    rat2.x vs OrigEndorse1 80.5
    rat2.x vs Endorse1 87.3
    rat2.x vs rat1 73.9
    rat2.x vs rat2 84.4
    rat2.x vs OrigEndorse2 82.6
    rat2.x vs Endorse2 97.1
    rat2.x vs rat1.x 89.3
    rat2.x vs rat2.x 100
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

  48. Shub Niggurath, this week is keeping me really busy, at least for a couple more days. I’m falling behind on a number of things, and I’ve barely been commenting. Thursday or Friday I can start filling requests again.

    In the meantime, you can see the file posted at Skeptical Science included in the mirror here. We can see a similar version of it here. The number of entries are the same, as are the order of ratings. However, the values in the columns don’t match up. At least, they don’t if you compare Orig_Endorse to Endorsement_Orig. They do match up if you compare Orig_Endorse to Endorsement_Final.

    That’s because the first file, identical to the one posted at Skeptical Science, shows pre-reconciliation ratings for its Endorsement_Orig and post-reconciliation ratings for its Endorsement_Final. The second file shows post-reconciliation ratings for its Orig_Endorse and final ratings (after tie-breaks) for its Final_Endorse. To make things more confusing, tiebreak ratings are included in both but have no pre-reconciliation ratings (represented as 0) but have their ratings given as post-reconciliation ratings.

    The file with the ratings listed by paper displays all of these values. It lists pre-reconciliation ratings as OrigEndorse1, OrigEndorse2 and OrigEndorse3. It lists post-reconciliation ratings as Endorse1, Endorse 2, Endorse 3 and TiebreakEndorse. It lists final results as FinalEndorse.

    If you do compare the columns as described above, you should get a 100% similarity.

  49. Hey Brandon, Thanks for clarifying. I was posting material mainly to serve as a thread ‘bump’.

    Your description makes clear why full data should have been released right at the beginning. I’ll keep looking.

  50. No problem. I’m usually better at responding promptly, but this bathroom remodel is taking up so much time. The two people running the show did a terrible job of spreading the work out over the time we have. It’s left me 36 hours to do 10 hours worth of cleaning in.

    By the way, “bumping” a topic is not a bad idea. I have a great work ethic, but I get distracted easily. If I don’t think about something regularly, I tend to forget about doing it.

  51. Things are a bit clearer now (only a bit though).

    Here’s the problem: Going by your description, the first file (http://www.hi-izuru.org/mirror/files/All%20Ratings.htm) has pre-reconciliation ratings. This file (http://www.hi-izuru.org/mirror/files/Ratings.htm) has OrigEndorse1, OrigEndorse2 and OrigEndorse3 as pre-reconciliation ratings.

    These are the two that don’t match. When you get the time, you’ll see what I’m saying. Or, hopefully, you’ll be able to figure out the problem.

    Thanks again.

  52. Here is code I use. Except for one line (commented #1), you can copy-paste from below into R and it should run.


    ##get ratings officially released by Cook
    rat<-read.table("http://www.skepticalscience.com/docs/tcp_allratings.txt", header=TRUE, sep=" ")
    #rename columns. 'artID' is abstract ID.
    names(rat)<-c("artID", "rat1", "cat1", "ratf", "cat2")
    #add an index
    no<-1:26848; rat<-cbind(no, rat)
    #'!duplicated' finds first instance of every abstract, i.e., original ratings
    firstratings<-rat[!duplicated(rat$artID),];fdata save as raterid.csv in working folder #[1]
    mirrordata<-read.csv("raterid.csv", header=TRUE);
    #get abstract ID and rating columns. 'Id' is abstract ID.
    mdata<-mirrordata[,c(1,4,7,9,12,14,17,19,22)]

    fmdata<-merge(fdata, mdata, by.x="artID", by.y="Id")

    The above gives the user ratings Cook officially made available through his website and the mirrored ratings, which were not released and combines them into one data frame (‘fmdata’).

    Compare original rating ‘rat1’ versus any other rating in ‘fmdata’. There is no match.

    ‘rat1’ excludes second, third …nth instance each abstract is listed in the long format Cook released their data in. It includes the first instance of an abstract rating so the question of tiebreak ratings should not arise.

    This is what you get


    attach(fmdata)
    table(rat1)

    0 1 2 3 4 5 6 7
    226 141 1068 3079 7331 59 29 11

    table(OrigEndorse1)

    0 1 2 3 4 5 6 7
    214 129 1088 2921 7485 68 28 11

    This is comparison between what was made publicly available by the authors to what appears to a data master table. There should be no discrepancy. So how does one explain what is seen?

  53. The code above got messed up. The correct form is below:


    ##get ratings officially released by Cook
    rat<-read.table("http://www.skepticalscience.com/docs/tcp_allratings.txt", header=TRUE, sep=" ")
    #rename columns. 'artID' is abstract ID.
    names(rat)<-c("artID", "rat1", "cat1", "ratf", "cat2")
    #add an index
    no<-1:26848; rat<-cbind(no, rat)
    # find first instance of every abstract, i.e., original rating
    firstratings<-rat[!duplicated(rat$artID),];fdata save as raterid.csv in working folder
    mirrordata<-read.csv("raterid.csv", header=TRUE) #1
    #get abstract ID and rating columns. 'Id' is abstract ID.
    mdata<-mirrordata[,c(1,4,7,9,12,14,17,19,22)]

    #join the two
    fmdata<-merge(fdata, mdata, by.x="artID", by.y="Id")

  54. WordPress messes with the code. Sorry for triple post. Please delete as appropriate:

    ##get ratings officially released by Cook
    rat<-read.table("http://www.skepticalscience.com/docs/tcp_allratings.txt&quot;, header=TRUE, sep=" ")
    #rename columns. 'artID' is abstract ID.
    names(rat)<-c("artID", "rat1", "cat1", "ratf", "cat2")
    #add an index
    no<-1:26848; rat<-cbind(no, rat)
    # find first instance of every abstract, i.e., original rating
    firstratings<-rat[!duplicated(rat$artID),];fdata save as raterid.csv in working folder
    mirrordata<-read.csv("raterid.csv", header=TRUE) #1
    #get abstract ID and rating columns. 'Id' is abstract ID.
    mdata<-mirrordata[,c(1,4,7,9,12,14,17,19,22)]

    #join the two
    fmdata<-merge(fdata, mdata, by.x="artID", by.y="Id")

  55. Because of the difficulty WordPress causes with uploading code, Shub Niggurath uploaded his to a different site. You can find a link to it here:

    http://pastebin.com/8sPv1EmA

    We’ve talked a little already elsewhere, but for people seeing this here, it seems he is right about the data files not matching up like they should. I missed the differences before because the final results are incredibly similar. We’ll be examining this issue more.

  56. For us non R junkies – any explanation?

    e.g.

    what does

    table(OrigEndorse1)

    0 1 2 3 4 5 6 7
    214 129 1088 2921 7485 68 28 11

    mean?

  57. Oh!?

    Pick a pepper, or pick a paper…

    My pick is 2247

    ———————————————————————————————————-
    ID 2247, Year: 1999, Title “Arctic Soil Respiration: Effects Of Climate And Vegetation Depend On Season”, Journal “Ecosystems”,
    Final Cat 2, Final Endors 3
    Ratings:
    (1): raterID 1439, Category 2, Endorsement 3, Matched Indiv rating. Comment – .
    (2): raterID 3364, Category 2, Endorsement 3, Matched Indiv rating. Comment – .
    (3): raterID -1, Category -1, Endorsement -1, -Not matched indiv rating. Comment – .
    (Tie): raterID -1, Category -1, Endorsement -1, -Not matched indiv rating. Comment – .

    Unreleased:
    [0] UR: paperID 2247, Date 1999, FinalCat 2, FinalEndorse 3.
    Ratings:
    (1): raterID 1439, category 2, endorsement 3, origend 2, origcat 3
    (2): raterID 3364, category 2, endorsement 3, origend 2, origcat 4
    (3): raterID 0, category 0, endorsement 0, origend 0, origcat 0
    (Tie): raterID 0, category 0, endorsement 0, origend 0, origcat 0

    Released:
    [0] Order: 2729, paperID 2247, Orig Endor 3, Orig Cat 2, Fin Cat 3, Fin Endor 2
    [1] Order:16553, paperID 2247, Orig Endor 4, Orig Cat 2, Fin Cat 3, Fin Endor 2
    ———————————————————————————————————-

  58. And still, at the end of the day, any expectation that Cook et al’s data will reveal anything that directly undermines itself has to be…..clearly delusional.

    Cook and Dana clearly get excited when severe intelligent people like Brandon and Shub pay attention…

    Meanwhile…

    The proper debate has moved on here

    http://www.joseduarte.com/blog/ignore-climate-consensus-studies-based-on-random-people-rating-journal-article-abstracts?#comment458302096668133254

  59. tlitb1, I’ve told you before, don’t make a ton of comments with no response to express thoughts which could be contained in far fewer comments. There was no reason for you to post six comments.

  60. Sorry, I’ll try to be more organised. But you’ve reset the count now 😉

    For a moment I thought those numbers were order through the original list, e.g. 214, but seems not

    ———————————————————————————————————-
    ID 5745, Year: 2002, Title “Methane-limited Methanotrophy In Tidal Freshwater Swamps”, Journal “Global Biogeochemical Cycles”,
    Final Cat 2, Final Endors 4
    Ratings:
    (1): raterID 4194, Category 4, Endorsement 4, Matched Indiv rating. Comment – .
    (2): raterID 873, Category 2, Endorsement 4, Matched Indiv rating. Comment – .
    (3): raterID -1, Category -1, Endorsement -1, -Not matched indiv rating. Comment – .
    (Tie): raterID 2103, Category 2, Endorsement 4, Matched Indiv rating. Comment – .

    Unreleased:
    [0] UR: paperID 5745, Date 2002, FinalCat 2, FinalEndorse 4.
    Ratings:
    (1): raterID 4194, category 4, endorsement 4, origend 4, origcat 4
    (2): raterID 873, category 2, endorsement 4, origend 2, origcat 4
    (3): raterID 0, category 0, endorsement 0, origend 0, origcat 0
    (Tie): raterID 2103, category 2, endorsement 4, origend 0, origcat 0

    Released:
    [0] Order: 214, paperID 5745, Orig Endor 4, Orig Cat 2, Fin Cat 4, Fin Endor 2
    [1] Order:16074, paperID 5745, Orig Endor 4, Orig Cat 4, Fin Cat 4, Fin Endor 4
    [2] Order:25899, paperID 5745, Orig Endor 0, Orig Cat 0, Fin Cat 4, Fin Endor 2
    ———————————————————————————————————-

    though my python mash-up could be wrong. Seems to be a lot of mismatches. There are 111 of the ‘Individual ratings’ that appear orphaned and not associated with any paper. I need to work on understanding R some more.

  61. I found some R tutorials so I know now that ‘table’ gives the counts of appearances of each endorse level 0-7, and I guess you would expect them to match as Shub Niggurath says, but after fixing some issues with my python mashup I got it to make a list of papers that have their rating dates not in ascending order, as I think one would expect. And it came up 1342 papers that show ratings ascending date order anomalies. There are a few with a day difference that you might accept but several like below with quite wide differences.

    [282] :
    ———————————————————————————————————-
    ID 3077, Year: 2000, Title “Carbon Sinks In The Kyoto Protocol – Potential Relevance For Us Forests”, Journal “Journal Of Forestry”,
    Final Cat 3, Final Endors 3
    Ratings:
    (1): raterID 4194, Category 3, Endorsement 3, Matched Indiv rating. Date: 11 Mar Comment – .
    (2): raterID 873, Category 3, Endorsement 4, Matched Indiv rating. Date: 21 Feb Comment – .
    (3): raterID -1, Category -1, Endorsement -1, -Not matched indiv rating. Date: — — Comment – .
    (Tie): raterID 2103, Category 3, Endorsement 3, Matched Indiv rating. Date: 27 May Comment – .

    Unreleased:
    [0] UR: paperID 3077, Date 2000, FinalCat 3, FinalEndorse 3.
    Ratings:
    (1): raterID 4194, category 3, endorsement 3, origcat 3, origend 3
    (2): raterID 873, category 3, endorsement 4, origcat 3, origend 4
    (3): raterID 0, category 0, endorsement 0, origcat 0, origend 0
    (Tie): raterID 2103, category 3, endorsement 3, origcat 0, origend 0

    Released:
    [0] Order: 895, paperID 3077, Orig Endor 4, Orig Cat 3, Fin Endor 4, Fin Cat 3
    [1] Order:14095, paperID 3077, Orig Endor 3, Orig Cat 3, Fin Endor 3, Fin Cat 3
    [2] Order:25116, paperID 3077, Orig Endor 0, Orig Cat 0, Fin Endor 3, Fin Cat 3
    ———————————————————————————————————-

    So it seems that whatever their concept of “original” endorsement is based on it doesn’t seem to mean the original date a rating was given to a paper.

  62. Just so you know, I’m not ignoring what you’ve said. I’ve just been sick for the last few days, and I haven’t had much energy to devote to these things. Hopefully I’ll be able to look at them tomorrow.

  63. No problem, I don’t expect a tutorial on R 🙂 I understand R is a familiar package to you stats guys it’s just I suspect that discussions relying solely on its terminology will seem rather gnomic to people not experienced in it 😉

    Just to clarify what I said above, it seems clear the ordinal on OrigEndorse1 does not necessarily mean it is a rating dated before OrigEndorse2, so the process in Shub’s algorithm will certainly find these issues.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s