TCP Reanalysis, Open Testing

Hey guys. You may remember I discussed the idea of creating a web system to attempt a re-analysis of the Skeptical Science Consensus Project. People’s feedback made me decide it’d be a project worth attempting, so I did. I’m happy to say I’ve now made enough progress to allow open testing. You can try it out here.

The system is still very preliminary, and I’m not worried about collecting data right now. I just want to make sure everything works. Once I know that, I can see about creating a system to allow everyone to view the results. I can also see about creating a page with guidelines for people to use. Also, I might be able to make it not look so bland.

Anyway, feel free to try this out and tell me what you think. Feedback is always appreciated. And, of course, you should feel free to support this effort:

Edit: It looks like the performance issues have been fixed. Everything should be working fine now.



  1. I find the rating system ambiguous. When you ask whether a paper ‘explicitly endorses AGW’, do you mean that it provides definite arguments in support the enhanced greenhouse effect? In which case, should papers that explicitly state a belief in AGW without arguing for it be taken to provide implicit support? Similarly for the converse. I am going to proceed on the basis of the definitions above.

  2. Meklar, the intention is to mirror the questions asked by Cook et al, so explicitly endorsing AGW just means explicitly accepting the greenhouse effect is real (or anything similar). It’s incredibly broad.

    But it’s alright if people don’t use the same definition for now. Posting actual guidelines, and maybe even examples, is on my list of things to do. The current setup is just so people can test this out, get an idea of what it’d be like, and tell me what they think. In other words, I’ll be addressing your concern later on.

    By the way, you’re welcome to leave comments when rating an abstract if you want to express thoughts beyond what the ratings allow. I intend to make comments visible later on, so people can “compare notes.” That means even if you don’t like the ratings, you can still submit participate.

    (Though as a technical limitation, your comment will only be stored if you also select a value for each box. I’ll see about changing that.)

  3. Just rated five abstracts and the process went smoothly. I like the comment box as a way to justify the rating. Whether or not it provides useful information for this study (such as evaluating rater competence or seriousness), it makes me think more carefully about the drop down box choices I make.

    I like the fact that rating is very simple and you don’t try to squeeze too much into it or make the choices overly precise. Complication would harm your primary objective.

    At the entry point there should be brief instructions of what the rater will see and is expected to do as both a courtesy and a way to control the process.

    Although I didn’t see it, I assume you will provide an informed consent notice to raters and keep identities confidential when this goes live. Your protocol should follow the general practice of research with human subjects not only because it’s ethical, but because it holds you to standards of good practice that have been lacking with the earlier shoddy work.

  4. Gary, thanks for participating! I actually noticed your entries in the database before you posted your comment.

    I’m glad to hear my choice with the rating options was good for you. One of the main reasons for wanting to do this is my dislike of the options in Cook et al. I also think the comments box will be helpful if people leave comments like you did. If they do, we’ll be able to get an idea as to why some disagreements happen.

    On the idea of instructions, I definitely agree. Writing up some simple guidelines is one of the next things I want to do.

    On the issue of privacy, I didn’t mention it on the page, but I have been intending to keep everything confidential. One of the things I’ve been struggling with right now is figuring out how to keep people’s identities private while allowing people to see each others ratings. I could use people’s ID numbers, but those are auto-incremented so you could tell when a person joined.

    I think the simplest solution would be to warn people during registration that their username will be publicly visible. If someone wants to stay private, they could just pick an anonymous username. I like that because it lets people control their confidentiality, but I’m not sure if I’ll go that route yet.

  5. >Meklar, the intention is to mirror the questions asked by Cook et al, so explicitly endorsing AGW just means explicitly accepting the greenhouse effect is real (or anything similar). It’s incredibly broad.

    In the interests of accurate replication, I will change my ratings accordingly, and use the comment box to note if a paper actually supports AGW, rather than just assuming it.

    If this was the actual methodology of Cook et al, it weakens their case considerably, making it purely an argument from credentials. You may as well poll scientists for their favorite superstitions. 97.3 of scientists avoid treading on cracks in the pavement – so better watch your step or you will be a science-denier!

    I believe that the paper-counting exercises produced by Oreskes, Connolley, and Petersen are pretty spurious arguments anyway, since they are essentially counting political buzzwords, not arguments, and they don’t remotely control for the explosion of low grade material set in print since ‘publish or perish’ became imperative for academic success in the late ’60s (and the fact that trivial or junk papers are the most prone to such buzzwords anyway) .

  6. Meklar, yup. Once you understand their methodology, their results are pretty much meaningless. Basically, their consensus is made up of a ton of papers that do nothing more than say the greenhouse effect is real, a small number of papers which say humans are mostly responsible for it, and a handful of unspecified papers which actually provide some evidence.

    It’s remarkable really. Their consensus includes papers that do nothing to endorse global warming other than say things like, “carbon dioxide, which is a greenhouse gas.”

  7. A user informed me they were being presented papers they had already rated, which isn’t supposed to happen. While looking into that, I noticed users were also being presented abstracts whose ID numbers were greater than 50. That wasn’t supposed to happen either (I’m keeping the test sample small).

    The latter seemed easier to fix, so I looked into it first. When I did, I realized my SQL query had three conditions being evaluated with no parentheses. That screwed up how they were evaluated, allowing high ID numbers to be picked. I added parentheses to the query, and I believe that problem should be fixed.

    The same problem might have been causing the repeats; I’m not sure. If you see wind up rating the same entry twice, let me know, and I’ll look closer to see what’s up.

  8. Loading is slow, I guess because you construct the page only after I click. You may want to pre-load.

    I see you dropped the paper classification. I got 10 irrelevant papers. You may want to re-introduce the classification, either following Cook or an improved one:
    1a: Detection and attribution of climate change: Physical
    1b: Detection and attribution of climate change: Statistical
    1c: Detection and attribution of climate change: Paleo
    2. Other aspects of climate science
    3. Impacts of climate change
    4. Impacts of climate policy
    5. Not about climate change at all

  9. It turns out the fix to the bug causing the wrong abstracts to show up didn’t fix the bug of getting the same abstract more than once. The latter was caused by a design mistake on my part. To check for papers which didn’t have ratings by a user, I combined the ratings and abstract tables, then filtered out entries made by the current rater.

    The problem was every rating got its own entry. If raters 3, 4, 5 and 7 all rated abstract 27, abstract 27 would show up four times. When rater 7 went to get a new set of abstracts, his entry would be filtered out, but the entries for raters 3, 4 and 5 would not be. I didn’t notice the issue when testing the system myself because the problem gets worse as more users perform ratings. With only using three test accounts (at ~15 ratings a piece), the problem wasn’t very visible. Having more people participating made it a lot easier to spot and diagnose.

    Anyway, I think the problem should be fixed now. Let me know if not.

  10. Richard Tol, I don’t think the slowness has anything to do with the pages. They weren’t loading slowly earlier, and right now, everything on my server is loading slowly. One thing you may notice is how quickly/slowly pages load isn’t constant. Sometimes they’ll load nearly instantaneously; other times they may even time out. That’s often seen when someone hosted on the same server is using too many resources. I’m going to e-mail my web host about shortly.

    On the issue of classification, I don’t intend to add a category for that at the moment. I don’t want people to have to pick too many things out at once. That’s especially true since reading for endorsement/quantification is rather different than reading for paper type. For now I’m going to just encourage people to leave any thoughts they have about that issue in the comments. I’ll revisit the issue once more people have tried this out.

    One thing to note is I’m hoping to host discussions of papers after sets of ratings are done. During those, I’d expect to talk about the types of papers that were looked at. For example, I might make a post saying something like:

    Okay guys, of these 50 papers, people rated 20 as endorsing global warming, five of which quantify it. What do think of those five papers?

    So even if categories are selected during the rating, they won’t be ignored.

  11. Observations

    1. Still loading slowly.

    2. Summary page has errors. I think I rated 5 abstracts, not 2. Also the math is wrong.

    “You’ve rated 2 abstracts.

    You rated 2 of them as endorsement level: 3

    You rated 2 of them as quantification level: 2”

  12. Nevermind about the math. However, The count is incorrect.

    Also, the system is repeating abstracts. In one case it showed my original response including a comment. In the other cases my responses did not appear.

  13. Brandon:
    It is your party.

    There is a difference, though, between a neutral rating because the paper is irrelevant, and a neutral rating because the paper is neutral.

    The vast majority of papers in Cook’s sample is irrelevant.

  14. Brandon,

    I think Richard Tol has a valid point about relevance. The 97% meme persists in part because thousands of papers were rated and that sounds impressive.

    Any plans to collect data about the raters at registration such as education, experience, age, gender, etc.?

  15. Two more comments –

    1. The registration page can be forced to have a sql error. The error message gives up some minor information about your server. Your logs will show the details.
    2. You’ll need to add “recover my password” functionality.

  16. DGH, I do apologize for the slowness. I’m currently working with my host to see if I can figure out what’s wrong.

    For the count, each paper can only be counted once. Since you’ve mentioned you’ve had repetitions, could that explain why your count is off? If not, I can check into the code and see if I can see what the trouble is.

    On repetitions, that doesn’t sound like a problem with the code. The code cannot possibly output any selections when generating the page. I’m not sure what would have caused that. Maybe it was tied to caching (I don’t use any, but your browser/ISP might)? Either way, I haven’t been able to force repetition to happen since I refactored the code last.

    For the registration page, it’d help if you could tell me how you caused the error (you can e-mail me it if you prefer). I haven’t really set up logging yet, so all I have are my Apache logs. I know that’s probably foolish, but I’ve had other priorities. Incidentally, what you say doesn’t surprise me. I haven’t even come close to writing all the code I need to catch errors that might get thrown.

    On passwords, I’m afraid it is impossible to recover them. I store passwords in a way that makes it impossible for anyone, including me, to see them. I’ll look into figuring out how to change/reset passwords though. For the moment, if someone e-mails me from the address they registered with, I can manually change the password for them.

  17. Richard Tol, I intend to focus this reanalysis on papers which Cook et al rated as endorsing or rejecting global warming. That means neutral ratings shouldn’t be too common. It’ll be easy enough to extract those rated as neutral for further examination. And of course, I can always run additional rounds of ratings with new questions.

    Gary, if my expectations are correct, this approach will show thousands of papers were rated as endorsing the consensus, but most of those don’t do more than say the greenhouse effect is real. That’s a far more pressing concern than whether or not the papers are “relevant.”

    As for relevance, I don’t think people will agree about what is “relevant.” I know I’ve disagreed with Richard Tol on relevance before. Some people claim a paper has to seek to provide evidence to be relevant, but I disagree. I say a consensus is measured by popularity, not evidence. The fact many papers aren’t “relevant” is an inherent aspect of a consensus. I’d rather not have disagreements like that cloud results.

    Collecting personal data could be interesting, but I’m not going to make it part of registration. I don’t think people should need to tell me who they are in order to participate. I would like to make profile pages though, where people can put in however much information they’d like. That’d be fairly far down on my list of priorities though.

  18. Richard Tol, both of those are relevant to a consensus. A consensus is merely a general agreement. The existence of a consensus say nothing about the merits of the consensus. It could be a consensus you find consists of schizophrenics who believe global warming is real because Zod tells them so. The absurdity of their arguments wouldn’t affect whether or not there is a consensus.

    The entire reason argument by consensus is so meaningless is that the idea of a “consensus” is meaningless. If we redefined “consensus” to only cover those entries which actually examine evidence, arguments by consensus would actually be useful. We wouldn’t even being able to say, “Science doesn’t operate by consensus,” anymore.

  19. Hey guys, I wanted to let you know you may notice your rating count has dropped. I decided to delete all the entries for abstracts which weren’t supposed to have been presented for rating. There weren’t a lot of them (excluding a bunch of test ratings I did early on), but it was making some statistics harder to calculate. I figure since they weren’t supposed to exist, there was no reason to write code to work around them.

    Your current count should now reflect how many papers out of the first 50 papers of the data set you have rated. Once the count hits 50, you should receive a message telling you there are no more papers to rate.

  20. Richard Tol, it may be worth examining the makeup of the data set, but right now, I’m doing a reanalysis of their project. Their project was about the existence of a consensus, not the significance of one. As such, I’m happy to discuss the significance of any consensus the results might show, but I’m not inclined to introduce an entirely new topic into the rating system.

  21. Brandon,

    No need to apologize. You’re in beta for a reason. I just wanted to confirm that the slow response persisted.

    I should apologize to you over the password issue. (Note to self – invent passwords after morning coffee.)

    On the page which shows a user’s progress I was seeing the total number of papers tallied incorrectly. I read and rated 15 abstracts. After each group of 5 I went back to check my progress. The summary was never correct – it gave me credit for fewer abstracts than I actually rated.

    “On repetitions, that doesn’t sound like a problem with the code. The code cannot possibly output any selections when generating the page.”

    That’s strange but I assure you that it happened. I did a triple take. Maybe some sort of autofill thing in my browser? Or perhaps another browser issue as you noted. For what it’s worth I was using Safari on an iPad.

  22. DGH, I don’t mind changing passwords. I ought to have a way to allow users to do it themselves. It’s just tricky to set up securely.

    On the other issues, let me know if you keep experiencing them. I can check the database to see what it has for you. I do know it shows a couple duplicates on your account. I have code to prevent people from being given the same paper twice, but I haven’t done anything to ensure they don’t rate the same paper twice. A bug (or malicious intent) could allow a user to rate papers multiple times.

  23. Are the 500+ papers not used by Cook thrown out here as well? I suppose they can be used as examples when you get to the real ratings.

  24. We may have tracked down the problem causing the site to run so slowly. I’m restructuring a couple tables to address the issue. I should know if the slowdown is fixed within the hour.

  25. I’m glad to hear speed is no longer an issue Richard Tol.

    Now that I’ve tested this on a meaningless sample, I’m thinking of switching out the test set with only papers rated as endorsing the consensus by Cook et al. What do you guys think?

  26. Richard Tol, I would like to do as you have described. Most of the papers I have seen are not relevant to the debate, except for a throwaway genuflection to “global climate change” somewhere in the abstract, often as jarring and out of context as an outburst from a Tourette’s syndrome sufferer. There should be a prize a la Douglas Adam for the most gratuitous reference to ‘global climate change’. The question is whether to go by the paper’s actual contents, or the affectations of belief shoehorned into the abstract. On Brandon Shollenberger’s recommendation, and in the interest of replicating Cook et al, I have chosen the second option.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s