Input Wanted! How can we make the data available to you!

Pre-claimer: This post is different! Victoria and I have decided to start an open conversation with you, the reader and hopeful future user of these climate data! Our goal is to get you thinking and talking about how we could best serve and organize all the data we have compiled for this project so far!

Daaaata… Daaaata… Data… Data… Data…Data…Data…Dataaaaaaaa 

(apologies to Pink Panther)


We have compiled over 520 time series! That’s a lot of data! Credit

Victoria: One of the first things you learn in graduate school is that the word data is plural (singular=datum). It’s the hallmark of a newbie to say “the data is noisy”, when we all know that “the data are noisy”. Which doesn’t mean that the data speak loudly. The second thing you learn is that the data are always noisy!

In our project, we are not generating any new data. We’re coraling data, wrassling it into submission, branding it, saddling it, crossbreeding it, and now… we want to set it free. That is to say, that even though we haven’t collected these data ourselves, we want the work we’ve done to collate and analyze our climate data sets to be available to other Chesapeake Bay researchers. In the language of marketing, these are an ancillary value added product.

Kari’s hard work has resulted in a bunch of timeseries specific to the Cheaspeake region, including HadEx2, GHCNDex, and 18 local weather station timeseries (=20) for each of 26 climate extreme indices (=520 timeseries). Plus she’s collected together timeseries from the NOAA NERR SWMP dataset (multiple locations) and ancillary data from river discharge (USGS), tide gauges (NOAA). Plus I’ve assembled 8 climate models extreme event indices (8×26). So we have a lot of timeseries that we’d like to make available to other researchers that might find them useful.

But its not just the extreme climate timeseries…

An online repository sounds nice, but what would it look like! Credit

An online repository sounds nice, but what would it look like! Credit

Kari: A major part of this project has been working with the data. It seems intuitive but a lot of time went into calculating these extreme climate indices, as well as applying certain statistical tests and smoothing techniques.

All of these calculations and data analyses were conducted using R, an open source and free computing language! So, I have generated a lot of helpful R-code which could be useful to anyone who wants to understand how we got our extreme climate values or maybe to repeat this analysis in another region. Although I am a novice at R scripting, no one should have to repeat what I did! (Improve it definitely, but redo, definitely not!)

A master online repository, then, gets even more added value by not only storing and archiving our gathered data, but also compiling this R code!

But, I have to admit, I have been finding it hard to visualize how to store, display, and highlight this vast wealth of data in a way that would be useful for all future users! The amount of data is over-whelming!

We want to make these timeseries and R codes availble to everyone...and not left to the storage of a single computer! credit

We want to make these timeseries and R codes available to everyone…and not left to the storage of a single computer! We’ve all been there! credit

Victoria: I agree, it’s a lot of data. And its challenging to decide how to make it available in the easiest possible format for other scientists and managers to use. Almost everyone can work in Excel, or can read excel files into another program, so these sound like a good starting point.

Fortunately, even for the timeseries that are available monthly, the amount of data is manageable in an excel .csv file. I’m hoping that we can develop a website that would provide an overview figure of each index and also the files for download.

Help us sort the apples from the oranges...or maybe combine apples and oranges into one nice package! credit

Help us sort the apples from the oranges…or maybe combine apples and oranges into one nice package! credit

Kari: At the moment (yesterday that is), I have converted all the annual extreme climate indices (calculated from huge daily time series) into .csv files. But all I started to do the same to the monthly indices, I ran into an organizational dilemma!

Each monthly index has 12 time series (one for each month), but for each type of data (20 time series). So, this data could be archived as 12 separate .csv files for each month (with 20 columns each), or one master .csv file with 240 columns! Which format is more useful?

In our first Think Tank, our partners expressed the desire to have these time series on their own NERR websites. We are looking for your input on what that could look like!

Comment on this post or feel free to email me at to give us input and ideas!



Kari Pohl

About Kari Pohl

I am a post-doctoral researcher at NOAA and the University of Maryland (Center for Environmental Science at Horn Point Laboratory). My work investigates how climate variability and extremes affect the diverse ecosystems in Chesapeake Bay. I received a Ph.D. in oceanography from the University of Rhode Island (2014) and received a B.S. in Environmental Science and a B.A. in Chemistry from Roger Williams University (2009). When I am not busy being a scientist, my hobbies include running, watching (and often yelling at) the Boston Bruins, and taking photos of my cat.
This entry was posted in Data and tagged , , . Bookmark the permalink.

One Response to Input Wanted! How can we make the data available to you!

  1. Mike St.Laurent says:

    Depending on the quantity of data and the type of data, you should look into the HDF5 format. This format is very flexible and is compatible with many different programming languages / software. One HDF5 file could contain all your data, even if it is a mix of grids, strings, vectors, etc.

    As for a website, you should look into using HDF5 and accessing it / extracting data with python for web display. An example of this is

    All the data in the tables you see if from a single 4-D HDF5 file (4 independent variables: latitude, longitude, duration, average recurrance interval; and the dependent variable: precipitation estimate). Towards the bottom of the page, the user can get estimates in CSV form for that specific location (this is also using python with the HDF5 data file).

    The above process really depends on how large / how many dimensions your data will have. A set of CSVs (not auto-generated) or plain text ascii grids may be more than enough. Sometimes simpler is better. An example of downloading static grids:

    These are just some suggestions of what is out there. Keep up the good work!

Leave a Reply

Your email address will not be published. Required fields are marked *