A Small Challenge

I happened to come across something which annoyed me on the Berkeley Earth (better known as BEST) website today. I’ll discuss it later, but it reminded me of something I’ve been interested in about that group’s efforts. For those who don’t know, their project involves creating a record of the planet’s temperatures.

To do so, they combine data from many different temperature records across the globe. There are lots of different ways to do this, and there are lots of debates about how good or bad any of them are. I won’t get into that, but I want to talk about one newish thing BEST does. Instead of adjusting individual records when it appears there’s a shift in data unrelated to climate (such as you’d get if a temperature station moved), BEST simply splits the record into separate series.

It’s a good approach. If a temperature station moves three times, we’d have four different segments with little relation to one another. Treating them as four different series makes sense. The problem is figuring out where to split those series. How do you tell when a change in data is and is not related to climatic effects?

Sometimes you can tell because there are records of things like stations moving. Most of the time though, you can’t. You can only try to guess by looking at the data itself. To help, you can compare a record to nearby records and see if you can spot differences. Breaks in the data found this way are considered “empirical breakpoints.” BEST looks for such (in a bit more complicated way than I described).

The question is, does BEST find what it hopes to find? I’m skeptical. To see how you feel, here are three temperature records I picked out while browsing. I’m showing only the data from 1965-1991 (where there was overlap). For each one, I want you to see if you can guess how many breakpoints each series should have. If you’d like, feel free to try to guess where those breakpoints should be:

4-30_bp_ex

A couple notes.  The first two records should be similar because I intentionally picked stations near one another. Feel free to use that to help you try to find breakpoints or to ignore it all together.  Also, don’t think a small gap in data means there should be a breakpoint. None of the series shown have a breakpoint due to missing data.

Good luck.  If you can guess the right amount for any of these, you’re better than me. To make it more fair, here are larger versions of the three: first, second, third.

Advertisements

36 comments

  1. Brandon, the thing to consider here is how well you need to do in accurately finding and correcting breakpoints. We would all agree (I think) that not homogenizing the data leads to errors in the global mean temperature trend estimation.

    The issue is how well you need to do on individual stations, and the answer to that depends on how accurately you want to measure global mean temperature. At some point, other measurement errors become more significant in any case. One of these is the nonuniform and often sparsely sampled surface air temperature (and associated sea surface temperature).

    Once that becomes the dominant error source, there is no longer much use for further improvements in homogenization, unless you are interested in different questions, e.g., regional scale temperature trends.

    It’s worth bearing in mind the issues with the imperfect nature of TOBS and homogenization corrections as you reduce your integration area, especially in sparsely sampled regions (e.g., Northern Canada). This is similar to trying to use poverty criterion developed for the full United States to look at regional scale poverty. (Since the cost of living varies wildly between regions, this is a perfectly useless thing to do, but it is frequently done nonetheless.)

    All methods have warts, but luckily we don’t have to do things perfectly, only well enough to meet the needs for the analysis at hand.

    I would bet for your example, that the effect of the breakpoints does not lead to a significant change in temperature trend, even for that station. I’d be interested in seeing the numbers though.

  2. Carrick, I agree “not homogenizing” causes problems if when we say that, we mean “not doing anything” to address potential errors in the data. I don’t agree homogenization is the only solution though. I don’t even agree it’s necessarily the best solution.

    The issue is how well you need to do on individual stations, and the answer to that depends on how accurately you want to measure global mean temperature.

    I don’t think how well a methodology measures the global mean temperature is the standard I’d use. I get a lot of people feel that’s the most important index, but it’s a bad one for any sort of testing. It’d be like testing a GCM by only comparing how well it tracked global temperatures.

    Imagine if there were two significant biases in a methodology that, by their nature, would always exactly cancel out in regard to global mean temperatures. We’d never say it was fine to just ignore them. If the south hemisphere is biased one way and the north hemisphere is biased the other, that’s a problem regardless of whether or not the biases cancel out.

    I would bet for your example, that the effect of the breakpoints does not lead to a significant change in temperature trend, even for that station. I’d be interested in seeing the numbers though.

    You’d be wrong. One of the things I’ve long found fascinating about the BEST methodology is how much it smears information around. It’s spatial resolution is terrible. It can have entire areas showing trends far different from what the data for those areas show. I actually wrote a post highlighting the issue a while back.

    I had planned on trying to find out just what resolution BEST is supposed to be accurate at, but I haven’t gotten around to it. This post actually came about from me working on that project a bit more.

    Anyway, I should be posting the follow-up to this post later today.

  3. Oh, I forgot to mention one of the main reasons I’ve been looking at the breakpoint issue. I’ve seen a number of stations with ten or more “empirical” breaks. If you break tons of series into tons of pieces then align all the pieces in an “optimal” way, you will get high precision. There’s no guarantee you’ll get high accuracy though. It’s the classic problem of over-fitting.

    Far too often people get a “right” answer from a model and decide the model must be right.

  4. Brandon:

    I don’t think how well a methodology measures the global mean temperature is the standard I’d use.

    How well you need to do depends on the application. That also means if you used corrected data that were corrected for a particular application, you have to be careful when applying those date to another application, for which the corrections may not be valid. (My example of regional versus national poverty is an obvious example of this.)

    You’d be wrong. One of the things I’ve long found fascinating about the BEST methodology is how much it smears information around. It’s spatial resolution is terrible. It can have entire areas showing trends far different from what the data for those areas show. I actually wrote a post highlighting the issue a while back.

    Can you give me the trends and the errors for this example? I’d do this myself but you haven’t provided links to the data.

    “Smearing around” isn’t necessarily a problem. I’d expect it to get it wrong occasionally with statistical based breakpoint detection methods.

    Whether it is a problem depends on the magnitude of the effect of the smearing. What you’d ideally like to see is that the accuracy is improved, and that the increase in variance associated with the “smearing” is smaller than the decrease in variance associated with the homogenization working on average.

    I believe this has been addressed by the BEST people using Monte Carlo and jackknife resampling. I can’t vouch for how well it was tested, but they do seem to understand the issue of “overfitting” and do test for it using techniques that would be appropriate (in my opinion) for testing it.

    Regarding resolution, the effective resolution of this method (and any other than doesn’t have good metadata to impose correct breakpoints for) must depend on the density of stations and to an extent regional geography.

    This seems to be a workable problem for the US, which has so many stations, but I don’t know how you can use it in the regions that show the most warming but have the lowest coverage, such as Siberia and the North Arctic.

  5. BEST didn’t provide uncertainties calculations for the station trends, but I can provide the trends. The first area I examined for this was Illinois. I live in the state, and I was curious what BEST showed for it.

    I didn’t have anything in mind when I first looked at the stations in it, but I quickly found stations which had significantly higher “adjusted” trends than “raw” trends. For example, the raw data for one of the stations I highlighted above showed a cooling trend of -0.43. After the breakpoint adjustments, it’s given as 0.42. That made me curious so I kept looking at Illinois stations. I quickly realized the pattern held for the state in general:

    https://hiizuru.wordpress.com/2013/12/19/illinois-sucks-at-measuring-temperature/

    A very crude estimation of the effect was BEST increased a trend in Illinois data of about half a degree to a trend of one and a half degrees. Given the stated purpose of the adjustments is:

    During the Berkeley Earth averaging process we compare each station to other stations in its local neighborhood, which allows us to identify discontinuities and other heterogeneities in the time series from individual weather stations. The averaging process is then designed to automatically compensate for various biases that appear to be present. After the average field is constructed, it is possible to create a set of estimated bias corrections that suggest what the weather station might have reported had apparent biasing events not occurred.

    Unless we believe “apparent biasing events” in Illinois caused a dramatic cooling bias in the state’s data, it seems the explanation is BEST isn’t reliable at that scale.

  6. Brandon:

    For example, the raw data for one of the stations I highlighted above showed a cooling trend of -0.43. After the breakpoint adjustments, it’s given as 0.42.

    Over what period was this trend estimated?

    Unless we believe “apparent biasing events” in Illinois caused a dramatic cooling bias in the state’s data, it seems the explanation is BEST isn’t reliable at that scale.

    That is believable. Ideally I’d like to see the impact on the uncertainty as you make the area of integration smaller.

    If you wanted to look at Illinois scale temperature trends, you’d probably want to go through the station information, and select out those that have good meta data.

    By the way, even over a 100-year period, there are pockets of global cooling. Most of the Southeast US is viewed as having cooled in the last 100-years for example and gotten moister.

  7. By the way, simply finding neighboring sites with the same error in their homogenization corrections isn’t really a demonstration of anything, other than the errors introduced by the homogenization scheme correlate over short-enough distance scales.

  8. I admit I read WUWT’s front page, usually to see what articles get linked to. This one is apropos to the discussion here:

    http://wattsupwiththat.com/2014/05/05/surprise-global-warming-is-spatially-and-temporally-non-uniform/

    I wonder whether the variations in trends that the authors are seeing are meaningful. This is a paywalled article (my university doesn’t pay the separate subscription feel to Nature Climate Change, and I don’t feel like wasting my overhead dollars on this).

    I would guess that zonal (latitudinal) averages would be reliable:

    But the fine-grained trend estimations shown here:

    http://www.nature.com/nclimate/journal/vaop/ncurrent/fig_tab/nclimate2223_F2.html

    probably are not.

  9. Carrick, if you haven’t seen it yet, I posted the answer to this challenge. Amongst other things, it includes a link to station pages for each station I looked at.

    Over what period was this trend estimated?

    That one was for the 1875-1990 period. It’s not the best of examples though. The station has swath of missing data not long after it starts. The second station has its raw trend given as .96 while its adjusted trend is given as 2.34. That’s for 1962-2013.

    If you wanted to look at Illinois scale temperature trends, you’d probably want to go through the station information, and select out those that have good meta data.

    I wouldn’t look only at those ones, but there are definitely things I’d want to look at with them I couldn’t examine for other stations. It’s just a matter of figuring out which stations do have good meta data, collecting it and collecting their data. That’s a fair amount of of work, and I keep getting distracted by other things.

    By the way, even over a 100-year period, there are pockets of global cooling. Most of the Southeast US is viewed as having cooled in the last 100-years for example and gotten moister.

    I believe BEST disagrees. If I remember correctly, it says the Southeast US has warmed by about one degree in the last 100-years. Of course, I have a thing or five to say about that.

    By the way, simply finding neighboring sites with the same error in their homogenization corrections isn’t really a demonstration of anything, other than the errors introduced by the homogenization scheme correlate over short-enough distance scales.

    If what I’m seeing is error (it seems to be), it’s no surprise the errors would correlate like you mention. BEST compares stations to one another in order to look for breakpoints. That necessarily introduces correlation. The only question is how strong a correlation it’d be.

    I would guess that zonal (latitudinal) averages would be reliable:

    My concern with latitudinal averages is BEST uses a latitudinal parameter. That guarantees certain zonal patterns will be found. Given that, it seems circular to examine zonal patterns. Or at least, it does since I’m highly skeptical of how they handle their latitudinal parameter.

    That said, I’m much more confident in zonal averages for other sets. BEST is the only set where I’ve identified an issue where latitude is a primary problem.

  10. Oh, I should point something out since it may not be clear from my posts: I look at a lot more than I discuss. I’m not trying to document findings as I go. I’m just trying to put some simple topics out there for people to look at and possibly discuss. If you want to look at something I discuss in more depth or get additional information, odds are I can oblige.

    Right now I’m just working on how to collate data. Trying to collect data for hundreds of stations (if not more) which cover different periods is a pain for me. I’ve always hated I/O.

  11. Brandon:

    Carrick, if you haven’t seen it yet, I posted the answer to this challenge. Amongst other things, it includes a link to station pages for each station I looked at.

    Thanks… I’ll give it a look.

    That one was for the 1875-1990 period.

    There are so many problems prior to 1950, I really don’t expect any of the various methods to work well. But if I were interested and had more time than I do, I’d start by comparing the homogenizations for BEST to the adjusted NCDC data (although I believe it is unlikely that NCDC does any better on a station-by-station basis than BEST).

    I believe BEST disagrees. If I remember correctly, it says the Southeast US has warmed by about one degree in the last 100-years. Of course, I have a thing or five to say about that.

    I’ll have to look at that. It is a problem with the method if it’s getting an answer that is inconsistent with other temperature indexes over a region as large as the size of the SE US, in my opinion.

    BEST provides the gridded data in netcdf format, something that I can handle, but I’d have to write my own routines to extract the information from it. That’s where I’d start from, rather than individual stations.

    My concern with latitudinal averages is BEST uses a latitudinal parameter. That guarantees certain zonal patterns will be found. Given that, it seems circular to examine zonal patterns.

    If you feel like describing this in a bit more detail, that’d be useful. I wasn’t under the impression that this is a real issue with the BEST data.

  12. Carrick:

    There are so many problems prior to 1950, I really don’t expect any of the various methods to work well. But if I were interested and had more time than I do, I’d start by comparing the homogenizations for BEST to the adjusted NCDC data (although I believe it is unlikely that NCDC does any better on a station-by-station basis than BEST).

    Comparing BEST’s calculated breakpoints to other groups attempts at fixing data is something I want to do. I think it’d be interesting. I think it’s fun to look at “corrections” (or breakpoints) and try to figure out what they supposedly say about the real world.

    I’ll have to look at that. It is a problem with the method if it’s getting an answer that is inconsistent with other temperature indexes over a region as large as the size of the SE US, in my opinion.

    I did a quick check of the BEST website, and I saw a similar warming pattern over that period for every state in the area. If you’re right about what the other temperature indexes show (I haven’t checked yet), BEST is definitely inconsistent with them.

    BEST provides the gridded data in netcdf format, something that I can handle, but I’d have to write my own routines to extract the information from it. That’s where I’d start from, rather than individual stations.

    That’s definitely a better place to start if you just want to examine BEST’s results. However, I like to try to figure out how results were made. I think the best way to do that is to look at the data used and the final results, then work out how things got from one to the other.

    If you feel like describing this in a bit more detail, that’d be useful. I wasn’t under the impression that this is a real issue with the BEST data.

    One of the first things that caught my eye about BEST is its methodology assumes the correlation structure of temperatures across the planet is constant throughout time. That caught my eye because I’ve repeatedly heard people say global warming will cause some areas to warm faster than others. If that’s true, the correlation structure must necessarily change.

    There are a lot of details, but to keep it simple, BEST effectively detrends its data by regressing it against latitude (as well as a couple other things like altitude). Steven Mosher likes to talk about how this lets BEST explain ~80% of the variance in the data. However, he says this about it:

    That surface is defined to minimize the residuals.. think of it as least squares on steriods. What it says is that Position x,y,z,t has this deterministic temperature.
    That structure doesnt change.

    Whats left over: the residual which is the weather. It changes over time. And if that change persists we call it “climate change”

    Its the weather structure that gets krigged.. and going back in time we assume
    that the correlation structure is the same. Of course its not. I think that might be a bit thats lost on people: So we know that its different the issue is how does this bias the prediction….

    Hence the jacknife..

    While I’m happy he acknowledges the correlation structure of the “weather” changes (he didn’t for a couple years), he’s wrong to claim the structure of the regression BEST does doesn’t change. Global warming will cause different amounts of warming at the poles than it will at the equator. This means the relationship between latitude and temperature cannot remain constant.

    Given that, calculating a latitudinal parameter over one period then applying it for all time will affect BEST’s results in ways tied to latitude. I don’t know how serious a problem it may be, but it could introduce errors/bias just like doing an OLS fit over one period then applying it to an entire data set can.

    And as far as I know, jackknifing can’t possibly address this. It removes random subsets of data. That’s nothing like how you test for systematic bias present across all data

  13. Brandon:

    One of the first things that caught my eye about BEST is its methodology assumes the correlation structure of temperatures across the planet is constant throughout time. That caught my eye because I’ve repeatedly heard people say global warming will cause some areas to warm faster than others. If that’s true, the correlation structure must necessarily change.

    I think the assumption that the correlational structure is constant in space is certainly wrong. Their own studies disproved this. They also assume axial symmetry, which is an unfounded assumption. At the minimum, it needs to be demonstrated,not just assumed.

    It’s possible that the correlational structure is constant in time, but this needs to be demonstrated,not just assumed, as well.

    Jackknife resampling is a method for estimating uncertainty. Perhaps it is being used to test whether the correlational structure is constant in time or not, but I didn’t think that was how it is being used.

    Were I doing this problem from ground up, I’d use an EOF based formulation. I have thoughts on that, which may work, or may not. It’s a tough nut to fully crack.

  14. Carrick:

    I think the assumption that the correlational structure is constant in space is certainly wrong. Their own studies disproved this. They also assume axial symmetry, which is an unfounded assumption. At the minimum, it needs to be demonstrated,not just assumed.

    I’m especially curious about how they handle the correlation issue now that they’ve started examining ocean data. I’m pretty sure you can spot problems arising from it just by looking at the demo video of their results.

    It’s possible that the correlational structure is constant in time, but this needs to be demonstrated,not just assumed, as well.

    I don’t think it is possible in relation to the warming caused by global warming, but even if it were theoretically possible, BEST’s own data pretty much disproves it. The first thing I did upon seeing they made that assumption was grab some of their data and look at correlations within it. There are massive changes in BEST’s correlation structure.

    Arguing for an unchanging correlation structure (in time) would require saying BEST gets the correlation structure of the planet completely wrong for most of the time it covers. That argument doesn’t make things any better. At the very least, it’d prove their uncertainty levels are significantly understated.

    Jackknife resampling is a method for estimating uncertainty. Perhaps it is being used to test whether the correlational structure is constant in time or not, but I didn’t think that was how it is being used.

    It’s not. It was just used to calculate the uncertainty in their final results. I don’t know why Steven Mosher said what he said, but I’m pretty sure he was just saying something stupid.

    Were I doing this problem from ground up, I’d use an EOF based formulation. I have thoughts on that, which may work, or may not. It’s a tough nut to fully crack.

    The thing which bugs me about BEST is they didn’t bother to do simple tests. If they wanted to assume a constant correlation structure (in time), why not test out different calibration periods? That’d at least give them an idea of the effect their assumption being wrong would have. It wouldn’t have been very hard for them to test out five or ten different periods.

    But they didn’t even do testing like that for their OLS fits. You may remember BEST’s attempt at attribution. They’ve repeatedly published this image. That image could be quite different if they had just picked a different period to do their OLS fit over. Testing for effects like that is an obvious step, yet for some reason they just didn’t do it, not even when publishing the results in peer-reviewed literature.

    Mind you, I am only assuming they didn’t test for these issues. It is, of course, possible they tested the issues and never disclosed the results. I don’t think that’d be any better though.

  15. Brandon, that image is one of my least-favorite aspects of BEST. I’ve pointed out to Steven Mosher that they might as well have correlated against the number of fire station Dalmatians in San Francisco, but to deaf ears. After all, it’s total forcing that you should correlate against, and anyway, this model assumes both that only CO2 forcing is present and equally bad that there is no thermal inertia.

    The advantage of EOF by the way is that you can use it to compute spatio-temperoral eigenfunctions. No need to assume stationarity, and it is a well-studied optimal method for infilling missing data like this.

    As you are likely aware, you expand the original series in terms of descending magnitude of eigenvalues (sorted by percentage of variance explained). Then you can use a robust method for truncation like AIC (see e.g. this).

    There are issues that would have to be resolved relating to bias introduced by irregular sampling, but it seems to me that an iterative boot-strap method would fix this.

    Kriging using a correlation function for this problem just seems like a poor substitute. It may be “good enough” for global temperature trend, but I would be surprised if it worked for distances shorter than roughly 1000-km (typical correlation length).

  16. By the way, here’s GISTEMP’s map.

    While it’s called “trend”, it’s really trend * measurement_period.

    Note the blue area in the US SE,

    I’m pretty sure this is similar to what the other series find.

  17. Carrick:

    After all, it’s total forcing that you should correlate against, and anyway, this model assumes both that only CO2 forcing is present and equally bad that there is no thermal inertia.

    I wouldn’t be too bothered by them using CO2 as a proxy for all anthropogenic forcings, except they seem to try to confuse people about it. For example, they calculate a sensitivity to their proxy and compare it to the IPCC estimates for CO2 sensitivity.

    As you are likely aware, you expand the original series in terms of descending magnitude of eigenvalues (sorted by percentage of variance explained). Then you can use a robust method for truncation like AIC (see e.g. this).

    I became rather familiar with EOF due to Michael Mann’s original hockey stick. I like the methodology, but I don’t know enough about it and kriging to say one is better. On that note:

    Kriging using a correlation function for this problem just seems like a poor substitute. It may be “good enough” for global temperature trend, but I would be surprised if it worked for distances shorter than roughly 1000-km (typical correlation length).

    It’d be interesting to see a comparison of the two methodologies and how they handle different information density (any methodology should work if there’s enough data).

    Here’s annual.

    I can’t see cooling in the annual map, but the most it appears to be is trend neutral. That’s a far cry from what BEST seems to show. I think a direct comparison is called for.

    I know where BEST and HadCRUT have gridded data in netcdf files. I could probably get the numbers to compare for those regions sometime today (R works pretty well with that filetype). I could compare GISS too if similar data from it is easy to find.

    I don’t know about making figures though. I’m terrible with visuals.

  18. Here is a “marble” plot, showing the sign of the trends for individual adjusted GHCN stations, with some quality control imposed, for the period 1900-2010.

    https://dl.dropboxusercontent.com/u/4520911/Climate/Trends/CONUS-1900-2010-marble.pdf

    (Red is positive, blue is negative.)

    It’s a bit easier to see there is a general cooling trend in the SE US.

    The issue with kriging here isn’t kriging per se, just the interpolation function they choose for Best is overly simplistic. I think the method is better applied to problems like geoprospecting where you don’t have heteroskedasticity too.

  19. That plot definitely makes it easier to see. I think it’d be more informative if you could make the size of the circles depend on the size of the stations’ trends though. There are a lot of cooling stations in that region, but there could still be a warming trend in it if the amount they cooled by was small.

  20. Carrick, thanks. It’s interesting rather than the warm stations there having larger (absolute) trends than the cooling stations, they have smaller trends. That means the new version of that image paints a stronger picture of cooling in the region. That’s makes BEST’s results seem even more out of place.

    I’m definitely going to look at the BEST gridded data now.

  21. Well, I tried looking at the BEST gridded data. It didn’t work out. For some reason I couldn’t open their netCDF file in R. I had no problem doing so with GISS, but R couldn’t even read the header section of the file. The weird part is a standalone netCDF viewer program I have can read the file.

    I’m going to try seeing if I can find another package for R to try to read the file. If that doesn’t work, I’ll try using the program that can read the file to open it then re-save it. I know the latter will work (if nothing else, I can export the data as a .csv file), but it’s tedious, especially with a ~200MB file. It also shouldn’t be necessary. I may have to try contacting the guy who wrote the ncdf package for R and ask him to see if he can tell why his package can’t read the file. It’d be good to know whether there’s a bug in his package or if there’s something wonky about the file (or both).

    For some happier news, the GISS gridded data has been pretty easy to work with. The only thing I’ve had any trouble with is getting the right lat/lon/time indexes. I’m sure there’s a convenient way to pick a point like April, 1966, but I haven’t worked it out yet. I’m still extracting data by typing things like: x[51:55,61:70,244:1611]. That’s a small issue though. It hasn’t gotten in the way of me calculating linear fits for various grid points and times.

    Which brings me to a question. The box size is different depending on which data set you use. That means you can’t directly compare individual points. Should I interpolate for data sets with larger box sizes, smooth for data sets with smaller box sizes, or just not attempt direct point-by-point comparisons? I wouldn’t need to mess with any of this to create graphs which we could visually compare.

  22. I am pretty sure that GISTEMP is using GHCN-v3 without further adjustment (note before the shift to GHCN-v3 there was a point where adjustments were being made). So there isn’t a huge incentive to compare too fine a grid here.

    But to address your point directly, I’d choose some representative lat/longs, then use a cubic 2-d spline to interpolate at those points.

    You should see that the main effect of the interpolation is in the high frequency portion of the spectrum.

    But maybe it’s worth doing the graphs to start with and see if that is enough detail?

  23. I just came across a page at Skeptical Science which is relevant. It shows a map of the United States with temperature trends calculated from NCDC data for 1895-2012. It shows cooling for the southeast portion of the US. Of special note, Mississippi, Alabama and Georgia clearly show cooling for their entire states. BEST finds otherwise. I’ll list the rate of warming it finds (per century) since 1860 and 1910 for each state:

    Mississippi: 0.63 0.55
    Alabama: 0.61 0.52
    Georgia: 0.58 0.53

    The values I got when I examined the data they provided were different (I did simple OLS fits), but the BEST data set clearly shows warming in those states for the last ~200 years. About the only time there was cooling in that time is right after 1950.

    I did a quick estimate, and I figure that means BEST is taking an area of (at least) 500,000 square kilometers the data shows cooling for and somehow finding the area has warmed significantly. I find that troubling.

  24. Is it really worth interpolating prior to comparing data sets? It seems simpler to smooth the finer scale data sets to match the coarser scale. Once you’ve done that, you can directly compare grid cells. That seems simpler, and you wouldn’t lose much information. You can always interpolate after.

    Plus, with how much smearing it seems BEST does, I’m not sure what value there’d be in looking at 1×1 grid cells. The BEST data is (effectively) interpolated already. Why interpolate another data set rather than just back out the process for BEST?

  25. Brandon, smoothing works too. It’s just easy enough for me to apply 2-d splines that I’d personally do it that way instead. (I’m a UNIX toolbox guy, so most of what I run are command line utilities that operate on pipes or files. The advantage of that is a 2-d cubic spline utility that I wrote in 1988 is still usable 25+ years later.)

    For the SE US, it’s also useful to compare it to precipitation and cloud cover. I don’t have anything handy to point to, but my hunch is you’d find is an increase in precipitation and cloud cover, probably driven by a warming Gulf of Mexico. (So that’s a prediction of sorts—does the Gulf show warming?)

    In my opinion, it’s very important to not temporally smear over scales like this, if you want to study regional patterns of warming. I think even GISTEMP gets it wrong by using the 1200-km instead averaging. The map I showed was the 250-km radius averaging instead.

    The purpose of looking at 1°x1° cells is it is possible that they did their optimization and testing based on that resolution. I think it’s certain that even GHCN-v3 gets it wrong on a fine enough scale. Then it becomes a question… did GHCN-v3 correct the wrong direction or did BEST?

  26. Brandon, one thing to keep in mind here is what we seem to be seeing is a spatial smearing of warming patterns. You can see this by comparing the GISTEMP 1200-km to GISTEMP 250-km.

    The much large trends seen by BEST does not give me a lot of faith they are doing better at resolving smaller spatial scales than other series.

    But the point I wanted to make here is the US is extremely densely measured compared to many areas of the world. It is likely that the spatial smearing problem gets worse the fewer the stations you have. And it’s possible that you end up with a net bias in your global mean temperature, when you combine the prospect of smearing possibly higher temperature trends from regions that have more coverage into regions that have less.

    This is in no way to suggest that I would expect the observed global warming to go away with improved analysis methods that don’t suffer from these problems, but it is suggestive that we need to be concerned about “hot spots” in areas that have few stations.

  27. Carrick, thanks for pointing me to the Climate Explorer. I hadn’t realized they had BEST available on it. That gives a handy way to do simple checks of areas.

    But the point I wanted to make here is the US is extremely densely measured compared to many areas of the world. It is likely that the spatial smearing problem gets worse the fewer the stations you have. And it’s possible that you end up with a net bias in your global mean temperature, when you combine the prospect of smearing possibly higher temperature trends from regions that have more coverage into regions that have less.

    Aye. I’ve been wanting to look at is the weighting processes BEST uses. Of special interest to me is BEST weights stations by how closely they reflect an average of the stations in the area. That’s guaranteed to cause smearing. What I need to find out is how large an area is used in those comparisons (I think I knew at one point).

    I’ve been mostly interested in regional (or finer) scales, but you’re right to bring up the possibility of bias in the global record. It’s easy to imagine situations where that could happen. For example, suppose we were trying to estimate a temperature field where the “true values” of it showed 80% of the area warmed, 20% cooled.

    How accurate the overall estimation would be depends on the distribution of the warming/cooling areas, as well as the weighting processes used. A weighting process might be able to avoid bias (its smearing would spread the cooling enough to offset the smearing of the warming), but that’s something you’d need to establish.

    This is in no way to suggest that I would expect the observed global warming to go away with improved analysis methods that don’t suffer from these problems, but it is suggestive that we need to be concerned about “hot spots” in areas that have few stations.

    If you haven’t, you should take a look at the video BEST produced showing the results of their land+ocean data set. I find it confusing. There are huge warming spikes in the middle of oceans that cover large areas and come and go rapidly.

    It’s difficult to imagine how hundreds of thousands of square miles would warm by two or more degrees in the middle of the ocean then return to previous temperatures, all in the span of a few years. It’s even more difficult to imagine how we could reasonably believe we know they happened at times like 1878. Somehow I doubt we have the data to justify it.

    Actually, now that I look at it again, I think the same thing happens over land a few times. You mentioned northern Siberia. Look at 1919-1920 in that video. in 1920 of that video. I’m not good at judging geographical locations, but it looks to me like that video shows 2+ degrees of warming not just for northern Siberia, but most of northern Russia. I’m pretty sure that’s mostly due to smearing.

  28. Oh, I have to quote something I came across while trying to find the link to that video. I don’t think I noticed this comment by Steven Mosher at the time, but I find it dumbfounding:

    knowing Brandon,I would say he is lying.

    and if he is asked to publish his test he will quicky cobble something together and back date it.

    I don’t think I need to add any editorial remarks to that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s