Lamb-Richman Dataset Generator

In 2011 I wrote a library that generates or extends the Lamb-Richman dataset using COOP station measurement data from the US (NOAA/NWS) and Canada (when available). Recently I got around to adding it to GitHub so anyone (especially those in academia) can use it to extend or regenerate the dataset further for climate science.

The library is written in C# targeting the Mono framework on OS X but should work on Windows as well. There are console projects for the import and export processes as well as a console project for automating the entire process for multiple years. Finally there is a library containing the unit tests (NUnit). As of the time of writing this post all tests pass on Mono 3.10.0.

The library takes all the source data for a year and imports it into a SQLite database. The master station list is also imported into this database but this can be done just once to create a template database for reuse. Next the grid station info must be imported for each type (these files are in a format specific to the dataset), they can be from the previous years run but any year would work since these provide the locations (latitude/longitude) of each grid point with “seed stations” that should be used for each grid point. Using info files from the closest year will simply result in less station substitution since the seed stations are more likely to still be active during that time. Finally the actual climate data for the entire year is imported (NOAA TD3200 and Environment Canada climate data).

The export process then queries the database and generates an output that conforms to the Lamb-Richman spec, substituting data from other stations using the master station data when necessary. The result is daily climate data for each grid point of each type of measurement along with the grind station info files (metadata) that can be used for subsequent runs and let you know where each measurement came from. It also generates a number of log files to provide additional info into exactly what happened during each run.

The dataset, created by Dr. Peter Lamb and Dr. Michael Richman of the University of Oklahoma School of Meteorology, the climate dataset provides max/min temperature and precipitation data from 766 uniform grid points over the US and Canada, east of the Rocky Mountains. This dataset is ideal for machine learning applications related to climate science. The library extended the existing dataset (1949-2000) from 2001 to 2010 and it can be used to automate the extension of additional years or regenerate the entire dataset again.


Hi there, I'm Seth Deckard, a software developer with years of experience working in Ruby and Rails. I co-founded WarningAware and have authored several open source projects on GitHub. You can reach me on Twitter.