SWISLR Webinar 8: Data Discussion
A commonality that we all share is data – if you don’t collect your own data you often use publicly available data to find information through national or regional datasets. Working with big publicly available data has introduced the need for some guiding principles in order to protect sensitive data and allow access to everyone equitably. One such set of principles are the FAIR data principles. This acronym stands for findable, accessible, interoperable, and reusable and these guidelines put emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. Without these pieces it is hard for data reuse to happen. Reuse is important in research because many people rely on syntheses, regional generalization, and comparing methods or results. Reuse and access is also important to community members and practitioners so they can make decisions using the best data possible. Specifically with SWISLR, FAIR data practices (https://www.go-fair.org/fair-principles/) are necessary if we want to make general statements of the issues SWISLR causes. There are really hard to solve problems happening right now, and in order to come up with resilient solutions, it is more beneficial if we can use data across regions and disciplines.
Organizations such as NSF, NASA, and journals (e.g. AGU) are creating policies where researchers are required to share their data and make the data publicly available in data repositories. There is a large landscape of data repositories available on the web that can make it harder to find the source of data you are looking for. Although they are helpful for housing data and making them publicly available, the number of repositories available can make it harder to know where to look for the data you are trying to find. There are efforts to combine these repositories as data atlases that constrain the geography or google data that houses the links to these data sources. However, they do not cover the full depth of data available on the web. Given this, Dr. Anna Braswell has asked – “How can we make data more discoverable?”
One way is that we can create a centralized data repository for all data like NOAA’s National centers for environmental information (https://www.ncei.noaa.gov/) or the European Marine Observation and Data Network (https://emodnet.ec.europa.eu/en). However, this requires a lot of time, space, and money that is not readily available. Another option is to require improvements to the information included with the open data so it can be more findable for systems like google data that scrape the web for you when you search. But this takes a lot of buy in from all the repositories housing data and researchers who publish their data. Although this is doable, it will take time. Finally, one option forward is to create a community around data. To achieve this, Dr. Braswell is creating a site where people can create posts about data they use and post questions they have about data availability and useability: https://copecomet.github.io/Coastal-Data/. The goal of this data curation service is to create a community around data and help improve the ways in which we use and find data.