The CBNERRS station located at Jug Bay, Maryland includes over 11 years of meteorological data. Lucky for us, 11 years in the minimum length required by Climdex to calculate climate extreme indices in their gridded data set.

One question that we have asked ourselves, though, is can we take this time series back even further than 11 years? In general, climate trends and patterns become clearer as a time series gets longer. This is one of the many reasons we need to continue collecting data!

The National Climatic Data Center (NCDC) has individual weather station data freely available throughout the country. You can access this data here. I retrieved Maryland weather data from 12 stations using the NCDC-Daily dataset.

The northern Chesapeake Bay, represented by Maryland, has 158 individual daily weather stations! That’s a lot of stations! To make this “quest for extension” more feasible, a loose criteria was created to shorten that list!

**Here was my criteria for station selection:** Must be at least 45 years long (≤1969), data must still be collected there, must be located near Chesapeake Bay, and must not have large data gaps from 2003-2014 . This not only extends our time series back further, but it also expands our spatial coverage **(Figure 1**).

For this post, I will be using the maximum daily temperature (Tmax) to determine if the Upper Marlboro NCDC-Daily weather station is correlated to the CBNERRS weather station located at Jug Bay, MD.

The first test we can apply is a simple visual investigation. If we plot the Upper Marlboro Tmax data on top of the Jug Bay Tmax, for the same date range, do they look alike? Looking at **Figure 2**, the answer is YES!

This may not seem scientific, but it immediately tells me, that I should proceed with a statistical test!

The second test we applied is a simple linear regression model (**Figure 3**). In this approach, we make Jug Bay the independent variable and plot it against the dependent variable of Upper Marlboro. In R, we can use the lm command to fit a linear regression. The summary function retrieves valuable information including the slope, y-intercept, p-value, and the coefficient of determination (R^{2}). A R^{2} of 1 would indicate a perfect fit, loosely meaning that the data perfectly agrees with each other .

LinearRegression <- lm(TmaxNERRS ~ TMAXNCDC)

summary(TmaxLinearRegression)

The linear regression for Upper Marlboro vs Jug Bay returns an R^{2} of 0.81 and a p-value <0.001, indicating that this trend is statistically significant. In other words, while there is some variability, we can say that Jug Bay is significantly correlated to Upper Marlboro.

Another way we can look at the relationship between Jug Bay and Upper Marlboro is to fit a 1:1 line through the data (**Figure 4**). A 1:1 line forces the slope of the line to be 1. This allows us to inspect for any biases between the data. For example, if the data points all plot above the 1:1 line, we could say that the NCDC Tmax data tended to be greater than the NERRS Tmax data. But: This is **not** the case for these data sets!

Lastly, we applied a cross-correlation function (ccf) using R (**Figure 5**). In general, this function compares the correlation of two time series. The key to interpreting this plot is to notice that at lag=0, the ccf is greatest. This indicates that Jug Bay and Upper Marlboro are most correlated to each other with no lag, or when they are “stacked on top of one another.” Similar to the linear regression (Figure 3), this also indicates that these two data sets are correlated to each other.

In R, a ccf is calculated by:

ccf(TmaxNERRS, TMAXNCDC, lag.max = 1000, type = c(“correlation”, “covariance”), plot = TRUE, na.action = na.pass, xlim=c(-500,500), ylab=”ccf”, main=”Upper Marlboro”)

We will repeat this analysis for all 12 of the Maryland NCDC-Daily Stations and 14 Virginia stations for Tmax, Tmin, and precipitation.

**Table 1** shows that the Maryland Tmax and Tmin NCDC station data were significantly correlated to the Jug Bay data set. However, as scientists, we constantly question ourselves and ask: *why are the R ^{2}’s not a perfect fit of 1?*

The variation could be from many station-specific artifacts (such as height of the thermometer, amount of time in shade, movement of thermometer to new location, implementation of new temperature recording devices), location-specific artifacts (elevation, proximity to water, urban heat island effects), and heterogeneity (“spottiness”) in climatic variation over this area.

There will certainly be follow-up tests to inspect this!