A while back, you may remember my quest to calculate the start and end date of the growing season length in the near-shore Chesapeake Bay area.
Our previous work has found strong confidence that the growing season has gotten longer (Fig. 1), but how it is getting longer is important (i.e. earlier, later, both?).
This post will highlight my approach, road blocks, and solutions to this interesting question!
First Approach: Using the single mean time series
In order to look at dates, and not length, we have to use the original time series of daily temperature used to calculate the growing season length. I have two versions (this will be important!)
Time series 1: All 18 individual GHCN-Daily stations as separate time series for daily temperature
Time series 2: A single time series that is the mean daily temperature for Chesapeake Bay (average of these 18 stations)
Both time series compose the same information, but the order of which the aggregated mean is taken will be important.
My first approach was to keep it simple: use time series 2, that single time series of daily average temperature to determine the start (as this first span of 6 consecutive days when Tmean > 5°C) and the end (as this first span of 6 consecutive days after July 1st when Tmean < 5°C) date of the growing season.
To do this, I created a simple loop in R. My approach was to treat the growing season temperature indicator (5°C) as a binary system (that’s only 1’s and 0’s!).
Let me explain this approach for determining the start of the growing season. If we set any temperature ≤ 5°C to 0, then any number >5°C to 1, we can easily find the first span of 6 days above 5°C! How? If we take a moving mean of a 6 day window (January 1-6, January 2-7, January 3-8…), the start of the growing season is the first time that mean is 1!
Note: Looping is not very efficient in R, but for this approach, a loop was the easiest approach I could come up with. And since our time series is short, we do not have to worry about R’s slowness here!
Data Check: Using the single mean time series
Before we get to look at the results, of course we have to check to see how good our ‘manually calculated’ growing season length calculation is!
Here, the growing season is the difference between the end date and start date. It should be pretty close to our growing season length calculated using the R package climdex.pcip.
From Figure 2, you can see that we have a pretty good and significant fit….but an R2 of 0.80 is not satisfying to me!
Why the ‘just okay’ fit?
The likely cause…..the order of which the mean was taken!
In our start date calculation above, we used a daily mean temperature time series that already had been aggregated by mean. However, when we calculated the growing season length using the climdex.pcip package, we calculated the length for each of the 18 stations, THEN took the mean.
This subtle difference has a huge impact on our data because of how it handles missing data. In our growing season length calculation with climdex.pcip, we rejected any years that had >15% or 3 full months of missing data.
This missing data, especially at the start or end of a year, could throw off our growing season start and end!
This ‘discard’ of years with a lot of missing data would not be factored in the already averaged time series.
The solution: repeat the process above for all 18 GCHN-Daily weather stations AND calculate the amount of missing data to determine data points we can throw out the same way as climdex.pcip (Figure 3).
Did it work with time series 1?
You bet! We improved the R2 fit to a 0.93 (Figure 4). There are still a few, very minor, differences. For example, in my method, years with <15% missing data with an NA for the end date are automatically set to 365; in other words, the growing season never technically ended. I am realizing just now that 365 would be wrong for leap years!
Nonetheless, I am happy with this improved fit! Now I can more confidently inspect the start and end date of the growing season!
That post will be coming soon!