Data Visualization: Correlation Matrices

What is a Correlation Matrix?

Figure 1: A simple regression plot showing the correlation between the Summer Days and Tropical Nights indices.

Figure 1: A simple regression plot showing the correlation between the Summer Days and Tropical Nights indices.

There are 26 extreme climate indices for temperature and precipitation being investigated in this project. What if we wanted to know which climate indices had any relationship to each other?

For example, is there any relationship between the warmest temperatures each year (TXx) with the coolest temperature each year (TNn)? What about any relationships with precipitation and temperature?

In order to test for any and all relationships, you could plot each climate index against the other 25 indices, such as Figure 1. This (with some overlap) results in 676 different combinations of regression plots! There has got to be an easier, and quicker way!

Figure 2: A correlation matrix for the 26 extreme climate indices.

Figure 2: A correlation matrix for the 26 extreme climate indices.

Luckily for us, there is an R package for that! A correlation matrix creates a, well matrix, of regression plots, allowing you to quickly visualize which variables have a correlation. We can even adjust the way it looks…using a color bar instead of a traditional Pearson regression plot.

What Does our Correlation Matrix Say?

Figure 2 is the correlation matrix for our extreme climate indices, using the HadEX2 data set. Indices have been ordered by: cold, warm, duration-based, and precipitation indices. Hopefully, it is easy for you to instantly see that red squares indicate a strong positive relationship and royal blue squares indicate a negative relationship. Note, we can change the appearance of this matrix easily to give the coefficient of correlation in each box (Figure 4, more on this below!).

Figure 3: The linear regression between the total annual precipitation (PRCPTOT) and amount of days with at least 10 mm precipitation (R10mm)

Figure 3: The linear regression between the total annual precipitation (PRCPTOT) and amount of days with at least 10 mm precipitation (R10mm).

Using this beautiful visualization display, it is easy to pick out index relationships of interest. For example, my eyes gravitated to the strong correlation between the total annual precipitation (PRCPTOT) and number of days with at least 10mm of precipitation (R10mm). (I plotted this box as a traditional linear regression in Figure 3). With a strong (R2=0.87) linear fit, these two indices appear to be related. (Warning: correlation does not mean causation!). I am interpreting this trend as a suggestion that the increase in total annual precipitation is a least connected to an increase in the amount of moderately wet days! This relationship will be elaborated in our white paper product currently in draft!

What interesting trends or relationships do you see?

Does the negative relationship of Frost Days with the Growing Season Length give you proof that these climate indices make sense?

How to make a correlation matrix in R

Do you have a multi-parameter data set that you want to apply a correlation matrix too? Or maybe you want to repeat this climate study in another region (such as the NERR sites in San Francisco Bay)? The code below will duplicate the matrix in Figure 2.

R level =EASY!

Figure 4: Customization #2 of the correlation matrix.

Figure 4: Customization #2 of the correlation matrix.

For beginners

First set your working directory, or the place on your computer where your data is saved.

setwd(“C:/your.directory.name”)

Second, import your data.

raw = read.csv(“your.file.name”)

Now the matrix

Step 1: Create a data frame for all the parameters you want in your matrix.

 

d<-data.frame(FD,ID,TX10p,TN10p,TXn,TNn,SU,TR,TX90p,TN90p,TNx,TXx,DTR,

GSL,WSDI,CSDI,CDD,CWD,R10mm,R20mm,Rx1day,Rx5day,SDII,R95p,R99p,PRCPTOT)

 

Step 2: Get the correlations in your data

M <- cor(d)

Step 3: Load the corrplot package

library(‘corrplot’)

Step 4: Plot your matrix

corrplot(M, method = “color”)

It’s that easy! Of course, we can play around with the corrplot arguments to customize our plot!

Customization 1: Recreate the colorful matrix in Figure 2.

Create a custom color palette:

col1 <- colorRampPalette(c(“#7F0000″,”red”,”#FF7F00″,”yellow”,”white”, “cyan”, “#007FFF”, “blue”,”#00007F”))

Now plot using the “color” method.

corrplot(M, method = “color”, type = “full”, add = FALSE, col=rev(col1(30)), bg = “white”, title = NULL, is.corr = TRUE)

Figure 5: Customization #3 of the correlation matrix.

Figure 5: Customization #3 of the correlation matrix.

Customization 2: Insert the numerical regression numbers and order parameters by first principal component (Figure 4).

 

cex.before <- par(“cex”)

par(cex = 0.65)

corrplot(M, col=rev(col1(30)), insig = “blank”, method = “color”, addCoef.col=”black”,    order = “FPC”, tl.cex = 1/par(“cex”),   cl.cex = 1/par(“cex”), addCoefasPercent = TRUE)

par(cex = cex.before)

 

Customization 3: Instead of colors, let’s plot this matrix using ellipses and sort by alphabetical order (Figure 5)!

corrplot(M, method = “ellipse”, order = “alphabet”, type = “full”, add = FALSE, col=rev(col1(30)), bg = “white”, title = NULL, is.corr = TRUE)

There you go! Now plot away!

Kari Pohl

About Kari Pohl

I am a post-doctoral researcher at NOAA and the University of Maryland (Center for Environmental Science at Horn Point Laboratory). My work investigates how climate variability and extremes affect the diverse ecosystems in Chesapeake Bay. I received a Ph.D. in oceanography from the University of Rhode Island (2014) and received a B.S. in Environmental Science and a B.A. in Chemistry from Roger Williams University (2009). When I am not busy being a scientist, my hobbies include running, watching (and often yelling at) the Boston Bruins, and taking photos of my cat.
This entry was posted in Data, R script and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *