Conquering the Data Universe with R

After their initial trainings, Stream Team volunteers jump into monthly monitoring and over time, gather and upload their collected data to the Chesapeake Data Explorer, the database which holds monitoring data throughout the Chesapeake Bay Watershed. After a minimum of 12 months of monitoring, Stream Team volunteers then have the opportunity to take a deeper look at their data and to explore what the findings mean. Preparation for one of these data exploration meetings has gotten the entire ALLARM team involved recently. The process is intense and involves checking errors (like rounding or precision) in volunteers’ data, compiling data statistics such as maximum and minimum values, and visualizing trends over time in graphs. This data was then shared with volunteers in virtual packets and discussed during ALLARM’s Data Exploration Workshop on Wednesday April 13^th.

This academic year, Phoebe Galione, ALLARM’s Outreach Manager, further developed detailed instructions on data analysis using Excel based off of preliminary data interpretation meetings that had happened for Cumberland Stream Team in early 2021. The process was divided into two phases: organization, and visualization. Following Phoebe’s guides, I spent the last few weeks of the Fall semester and parts of my winter break working on downloading data, checking for errors, and generating statistics for multiple sites. When the Spring semester came, other ALLARMies picked up the process and continued.

While Phoebe walked me through the process of data organization and visualization using Excel, I noticed repetitions in the task since we need to redo all the steps for each of over 50 sites for each volunteer’s data. Even though the task was not difficult, replicating the same procedures for each volunteer takes up a lot of time, and all ALLARMies had to get involved in in order to get the packets done by the deadline.

Seeing room to build in efficiencies, I developed R codes to streamline the process which ALLARM can apply to future data organization and interpretation processes. I learned programming in R when I was working on my independent research with my advisor, Prof Douglas. Prof. Douglas had introduced me to R as a tool to manipulate, transform, perform statistical tests, and present data. When I understood ALLARM’s goals for the data interpretation process, I was able to gather skill sets and tools and build a structure using R codes to teach the program to replicate the tasks that we have done in Excel and to continue performing the tasks whenever I import a new dataset.

In the future, if the data interpretation goals shift, ALLARM can adapt the scripts to make the changes accordingly, and the new adjustments can be reflected in all volunteers’ data within a few clicks. With the time being saved, ALLARM can speed up the preparation for data interpretation workshop and have all data organized and presented clearly for volunteers.