Quality Controls

Figure 1: Google Earth image of Jug Bay and Taskinas Creek.

Figure 1: Google Earth image of Jug Bay and Taskinas Creek.

I have been spending some time looking at the meteorological and water quality parameters available for Jug Bay, MD and Taskinas Creek, VA, both of which are sites within the Chesapeake Bay National Estuarine Research Reserve (NERR) network (Figure 1). I got this data off the Centralized Data Management Center website as part of the SWMP data set. All parameters (with a few exceptions in the earlier part of the data set) were collected at a frequency of 15 minutes…..that’s 96 observations per day!

Meteorological parameters include air temperature, barometric pressure (BP), relative humidity (RH), wind speed, total photosynthetically active radiation (PAR), and total precipitation.

Water quality parameters include water temperature, salinity, dissolved oxygen (DO), pH, turbidity, and chlorophyll (by fluorescence).

I had a few goals in mind:

1) Determine the range of values each site experienced over the data collection history

2) Create a correlation matrix to investigate any relationships

3) Investigate ecologically relevant relationships

Quality Control of the Data

Table 1: Flags definitions for the SWMP data.

Table 1: Flags definitions for the SWMP data.

As scientists, we often like to test the validity of our data before we start to make interpretations. When downloaded, the SWMP data comes with 9 different flags, allowing the user to determine which data they want to include, and which they want to reject (Table 1).

For our work, however, we must take an addition quality control step.

Why?

Firstly, a large portion of our data is flagged as +4, meaning it is historical data. These data really help extend our time series, making them highly valuable, but in need of some manual quality control checks. Secondly, since we are looking at extremes, the difference between a bad sensor value and an actual extreme environmental measure is very import!

Some R to show my approach

Continuing our post from two weeks ago, we can use the SWMPr package to help filter our data for only good measurements.

#load the SWMPr package

library(SWMPr)

# Set your directory (Note this is specific to your computer!)

path<-setwd(“C:/Users/Kari/Documents/RData/OPC”)

# import your data as a SWMPr object

JB_MET <- import_local(path, ‘CBMJBMET’, trace = T)

RR_WQ <- import_local(path, ‘CBMRRWQ’, trace = T)

# retain only 0, 1, 3, 4, 5 flags

raw_JB<-qaqc(raw_JB, qaqc_keep = c(0, 1, 3, 4, 5), trace=FALSE)

raw_RR<-qaqc(raw_RR, qaqc_keep = c(0, 1, 3, 4, 5),trace=FALSE)

Figure 2: Turbidity time series before manual QAQC.

Figure 2: Turbidity time series before manual QAQC.

At this point, we should only have data which have passed SWMP’s quality control measures. But, since the historical data could include some poor observations, I checked each parameter manually.

My approach was to plot the time series of each parameter (visual check) as well as calculate the range of values. I then could go into the literature to determine if any “outlier” looking values are possible as an extreme value or a bad measurement (in other words, a physically impossible reading).

I will note that almost all parameters passed this check with ease, but a few required special attention.

Figure 3: Turbidity after manual QAQC.

Figure 3: Turbidity after manual QAQC.

For example: The turbidity parameter at Taskinas Creek.

It is impossible to have a turbidity less than 0, so I know that those values in the earlier part of the data set are bad (Figure 2). So, I can simply subset the data to include only values ≥0 (Figure 3).

Now the turbidity looks more realistic! Unfortunately, I cannot (yet) determine if those higher turbidity values are extremes or erroneous. (Stay tuned!)

I will also note that I took a few other QAQC steps, such as erasing duplicated times and replacing NaN values with NAs.

Extremes values at Jug Bay and Taskinas Creek (Initial!)

Now with some quality controlled data, I can present the range of values (extreme low and high) as well as the 1st, 5th, 95th, and 99th percentile.

Table 3: Range of parameter values for Jug Bay.

Table 2: Range of parameter values for Jug Bay.

Table 3: Range of parameter values for Taskinas Creek.

Table 3: Range of parameter values for Taskinas Creek.

Investigate parameter relationships

mean_TC

Figure 3: Corrplot of the daily mean meteorological and water quality parameters at Taskinas Creek.

If you have read any previous posts, you have probably figured out that I love corrplots (correlation matrices). This plotting technique gives you a quick and easy way to visually identify strong relationships between all desired parameters.

The strongest relationship, as we previously discussed, is the relationship between air and water temperature. This relationship is important since it demonstrates that extreme weather temperatures (such as a really hot day) can directly affect the shallow water temperature in parts of Chesapeake Bay. This is important since extreme water temperatures could surpass the physiological thresholds (“livable temperature”) of many aquatic organisms.

Perhaps one of the most well-known examples of this is the 2005 eelgrass die off, and then recovery, in lower Chesapeake Bay.

Investigating Relationships

Figure 4:

Figure 4: Linear regression of daily mean, min, and max water temperature with dissolved oxygen. The first panel is as Taskinas Creek (mean) while the other two are at Jug Bay (min, max). I was trying to be diverse!

One relationship I have been interested pursuing is temperature versus dissolved oxygen. The ideal gas law already gave us an idea that the two should be connected… temperature has an inverse relationship with gas volume! In other words, the warmer the water, the less soluble the gas.

From Figures 3 & 4, we can see that temperature and dissolved oxygen are significantly linearly correlated. What is interesting, however, is the scatter in the daily maximum temperature versus daily maximum dissolved oxygen. This is likely were the biological component of DO coming into play!

Of course, QAQC is an important part of everything we do as scientists….this post is just a glimpse into the love and care we have been putting into this research!

Kari Pohl

About Kari Pohl

I am a post-doctoral researcher at NOAA and the University of Maryland (Center for Environmental Science at Horn Point Laboratory). My work investigates how climate variability and extremes affect the diverse ecosystems in Chesapeake Bay. I received a Ph.D. in oceanography from the University of Rhode Island (2014) and received a B.S. in Environmental Science and a B.A. in Chemistry from Roger Williams University (2009). When I am not busy being a scientist, my hobbies include running, watching (and often yelling at) the Boston Bruins, and taking photos of my cat.
This entry was posted in Data, R script and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *