I have been spending some time looking at the meteorological and water quality parameters available for Jug Bay, MD and Taskinas Creek, VA, both of which are sites within the Chesapeake Bay National Estuarine Research Reserve (NERR) network (Figure 1). I got this data off the Centralized Data Management Center website as part of the SWMP data set. All parameters (with a few exceptions in the earlier part of the data set) were collected at a frequency of 15 minutes…..that’s 96 observations per day!
Meteorological parameters include air temperature, barometric pressure (BP), relative humidity (RH), wind speed, total photosynthetically active radiation (PAR), and total precipitation.
Water quality parameters include water temperature, salinity, dissolved oxygen (DO), pH, turbidity, and chlorophyll (by fluorescence).
I had a few goals in mind:
1) Determine the range of values each site experienced over the data collection history
2) Create a correlation matrix to investigate any relationships
3) Investigate ecologically relevant relationships
Quality Control of the Data
As scientists, we often like to test the validity of our data before we start to make interpretations. When downloaded, the SWMP data comes with 9 different flags, allowing the user to determine which data they want to include, and which they want to reject (Table 1).
For our work, however, we must take an addition quality control step.
Firstly, a large portion of our data is flagged as +4, meaning it is historical data. These data really help extend our time series, making them highly valuable, but in need of some manual quality control checks. Secondly, since we are looking at extremes, the difference between a bad sensor value and an actual extreme environmental measure is very import!
Some R to show my approach
#load the SWMPr package
# Set your directory (Note this is specific to your computer!)
# import your data as a SWMPr object
JB_MET <- import_local(path, ‘CBMJBMET’, trace = T)
RR_WQ <- import_local(path, ‘CBMRRWQ’, trace = T)
# retain only 0, 1, 3, 4, 5 flags
raw_JB<-qaqc(raw_JB, qaqc_keep = c(0, 1, 3, 4, 5), trace=FALSE)
raw_RR<-qaqc(raw_RR, qaqc_keep = c(0, 1, 3, 4, 5),trace=FALSE)
At this point, we should only have data which have passed SWMP’s quality control measures. But, since the historical data could include some poor observations, I checked each parameter manually.
My approach was to plot the time series of each parameter (visual check) as well as calculate the range of values. I then could go into the literature to determine if any “outlier” looking values are possible as an extreme value or a bad measurement (in other words, a physically impossible reading).
I will note that almost all parameters passed this check with ease, but a few required special attention.
For example: The turbidity parameter at Taskinas Creek.
It is impossible to have a turbidity less than 0, so I know that those values in the earlier part of the data set are bad (Figure 2). So, I can simply subset the data to include only values ≥0 (Figure 3).
Now the turbidity looks more realistic! Unfortunately, I cannot (yet) determine if those higher turbidity values are extremes or erroneous. (Stay tuned!)
I will also note that I took a few other QAQC steps, such as erasing duplicated times and replacing NaN values with NAs.
Extremes values at Jug Bay and Taskinas Creek (Initial!)
Now with some quality controlled data, I can present the range of values (extreme low and high) as well as the 1st, 5th, 95th, and 99th percentile.
Investigate parameter relationships
If you have read any previous posts, you have probably figured out that I love corrplots (correlation matrices). This plotting technique gives you a quick and easy way to visually identify strong relationships between all desired parameters.
The strongest relationship, as we previously discussed, is the relationship between air and water temperature. This relationship is important since it demonstrates that extreme weather temperatures (such as a really hot day) can directly affect the shallow water temperature in parts of Chesapeake Bay. This is important since extreme water temperatures could surpass the physiological thresholds (“livable temperature”) of many aquatic organisms.
Perhaps one of the most well-known examples of this is the 2005 eelgrass die off, and then recovery, in lower Chesapeake Bay.
One relationship I have been interested pursuing is temperature versus dissolved oxygen. The ideal gas law already gave us an idea that the two should be connected… temperature has an inverse relationship with gas volume! In other words, the warmer the water, the less soluble the gas.
From Figures 3 & 4, we can see that temperature and dissolved oxygen are significantly linearly correlated. What is interesting, however, is the scatter in the daily maximum temperature versus daily maximum dissolved oxygen. This is likely were the biological component of DO coming into play!
Of course, QAQC is an important part of everything we do as scientists….this post is just a glimpse into the love and care we have been putting into this research!