Visualizing Data

Almost all of our analyses are done using the R computer language. One of R’s best features is its wonderful plotting capabilities.

We have been viewing our weather data as a time series, meaning that the x-axis is a time, date, year, etc. For the purpose of this project, time series style plots are extremely important in order to “see” any changes in weather event magnitude (how big is each event?), frequency (how often are events occurring?), and variance (What is the year-to-year pattern?).

Figure 1: The Icing Days index time series for all 20 data products.

Figure 1: The Icing Days index time series for all 20 data products.

For example, Figure 1 is the ~100 year time series for the Icing Days index, or the annual count of days when the daily maximum temperature was below freezing (brrr, that’s cold!). We can visually see that the frequency of Icing Days appears to be lower since the 1990’s compared to the 1950-1990’s period. We can also see that the year to year variability can be pretty high. For example, the Conowingo Dam weather station recorded no Icing Days in 2006 then 7 in the following year (2007).

At this specific location, that is like going from a year when temperatures never got below freezing to a week’s worth of ice-filled days. This type of inter-annual change can really test the resiliency of an ecosystem!

Time series plots also allow us to see the differences between the 20 data sets….but with this many lines all close together, it is really hard to visualize the spatial variability in our data.

Figure 2:

Figure 2: Boxplot of the Icing Days index for the same 20 data products used in Figure 1.

That is where boxplots can be useful. Figure 2 shows the same exact data, but in a completely different way. Now, our x-axis is the individual stations, arranged in decreasing latitude. This allows us to now understand how Icing Days, over the entire time series, changes by location. As expected, we can see that as you move southward, fewer Icing Days are observed.

In a boxplot, we get the median (black line), 3rd and 1st quartile (bounds of colorful box), and the maximum and minimum values (dashed lines). The open-faced circles, in our case, are the outliers, or number of Icing Days that are truly ‘extreme’. (Since I have extensively quality controlled these data, I am confident that these circles are actual events and not bad data.) We can now compare the range of each data product, as well as it’s latitudinal trend.

One of my favorite ways to analyze data is simply by plotting it in different ways. While the boxplot (Figure 2) allows us to see the range and median of each data set varying by location, the time series plot (Figure 1) allows us to see the changes which occur over the course of the data collection (such as a lower frequency of events).

Kari Pohl

About Kari Pohl

I am a post-doctoral researcher at NOAA and the University of Maryland (Center for Environmental Science at Horn Point Laboratory). My work investigates how climate variability and extremes affect the diverse ecosystems in Chesapeake Bay. I received a Ph.D. in oceanography from the University of Rhode Island (2014) and received a B.S. in Environmental Science and a B.A. in Chemistry from Roger Williams University (2009). When I am not busy being a scientist, my hobbies include running, watching (and often yelling at) the Boston Bruins, and taking photos of my cat.
This entry was posted in Data and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *