NOAA logo with seagull NOAA Environmental Data Management emblem
Home  | Schedule | Local Info



Toward Systematically Curating and Integrating Data Product Descriptive Information
Session Time January 10th; 9:00 am to 10:30 am
Location Glen Echo
Description Complete, consistent, and easy to understand information about data products is critical for meeting user needs of data discoverability, improved accessibility and usability, and interoperability.

In the BigData and Open Data Era, with ever increasing variety and number of data products, it becomes increasingly impractical to manually complete data descriptions/documentation. The most effective way to ensure the completeness and quality of metadata and description documentations is to curate data products in a systematic, consistent, and automatic manner based on standards, community best practices, and defined frameworks.

This session invites presentations describing and sharing work/progress on systems, tools, frameworks, workflows, etc. that enable systematic generation of descriptive information about data products for improved discoverability, usability and interoperability. Additionally this session will discuss gaps that still need to be addressed.

Chair Nancy Ritchey and Ge Peng
Presentations and Notes Click Here!


Talk Length (min) Title Presenter
5B.1 15 Facilitating Data Submission and Archive with the NCEI Water Column Sonar Data Packager Charles Anderson
5B.2 15 Design and implementation of automation tools for DSMM diagrams and reports Sonny Zinn
5B.3 15 Implementing Data Stewardship Maturity Matrix in ISO metadata Anna Milan
5B.4 45 Open Discussion


Abstracts
5B.1 Facilitating Data Submission and Archive with the NCEI Water Column Sonar Data Packager

Charles Anderson (NESDIS/NCEI)

Carrie C Wall (NCEI)

Archiving large volumes of complex environmental data requires both efficient archive systems for for ingest and thorough metadata documentation to ensure data usefulness now and into the future. The NCEI Water Column Sonar Data (WCSD) Archive addresses these needs with a data packaging tool built for data providers. Developed in collaboration with our NMFS Fisheries Science Center partners, the WCSD Packager is a stand-alone executable with a simple user interface to control packager operation and facilitate entry of metadata by the user. Using the packager, data providers specify data source and destination locations, easily enter basic metadata information aided by drop down lists and other controlled vocabulary fields, and click “package data”. From there data packaging is fully automatic. The packager copies the sonar and ancillary data files, generates ISO standard cruise-, dataset- and filelevel metadata records and creates an md5 checksum manifest file; all contained in a structured data package conforming to the Library of Congress BagIt specification. Due to the size of WCSD, the data packages are created on external hard drives that are then shipped to NCEI for ingest and archive. An individual drive can contain dozens of packages comprising several TB of data. The consistent structure of the data packages facilitates an automated archiving system that performs a checksum validation to ensure file integrity, archives the data files, populates the WCSD metadata database, and updates the WCSD data discovery and ordering portal without data manager intervention once the ingest is initiated. The WCSD Packager and automated ingest system have enabled the ingest and archival of 385 data packages comprising 31.6TB of WCSD since January 2014. The WCSD Packager serves as a model for facilitating data submission and automated archiving of other data streams at NCEI.


5B.2 Design and implementation of automation tools for DSMM diagrams and reports

Sonny Zinn (NESDIS/OneStop)

John Relph (NCEI), Ge Peng (CICS-NC), Anna Milan (NCEI), Aaron Rosenberg (ERT)

The OneStop project aims to make NOAA environmental data easily discoverable and useable by improving metadata and providing a user-friendly interface [1]. Providing transparent dataset quality information to users is a part of OneStop ready requirement. To help meet this requirement, the stewardship maturity of each dataset is thoroughly evaluated under nine categories using a consistent assessment framework, namely, the NCEI/CICS-NC Data Stewardship Maturity Matrix (DSMM) [2]. The evaluation process requires extensive research by metadata content editors who are specialized in information science. An evaluation produces a DSMM report which includes two figures, namely, a scoreboard and star rating chart, that are drawn using nine assessment scores ranging from 1 to 5. Creation of these diagrams turns out to be tedious and laborious as it entails coloring of 45 elements and shading of 45 stars for the scoreboard and star rating chart, respectively. To ease their efforts on writing a report, we created a software program that reads an existing DSMM report and automatically generates and embeds diagrams within the report. To provide a more efficient workflow, the program was extended to take inputs from a spreadsheet with DSMM information and it can generate over 700 reports in less than three hours. Currently we are further improving the workflow by adopting an existing web application called CEdit for collecting and storing DSMM information and by interfacing our automation program with CEdit.

References

[1] Casey, K.S., 2016: OneStop: Project Overview. Improving NOAA’s Data Discovery and Access Framework. 2016 ESIP Winter Meeting.

[2] Peng, G., J.L. Privette, E.J. Kearns, N.A. Ritchey, and S. Ansari, 2015: A unified framework for measuring stewardship practices applied to digital environmental datasets. Data Science Journal, 13, 231 - 253. doi: http://dx.doi.org/10.2481/dsj.14-049



5B.3 Implementing Data Stewardship Maturity Matrix in ISO metadata

Anna Milan (NESDIS/NCEI)

Knowing the data and stewardship maturity is essential in making informed decisions on which data product is best suited for a particular application. The Data Stewardship Maturity Matrix is a framework for assessing the maturity of data and is being used by NOAA OneStop. This presentation will describe the implementation of DSMM information in ISO compliant metadata.

Top of the Page

NOAA logo with seagull