Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Extensions to Metadata Languages for Environmental Primary Data

Abstract

During the last decade environmental scientists and managers have found in the Internet a new communication venue that has improved their productivity by allowing them to share data and knowledge more fluently. Since the invention of the eXtensible Markup Language (XML), XML has brought the attention of many researchers to improve the communications among people working with data-centric documents since XML was supposed to be the correct approach to standardize data. During this decade, many papers have been published under the pretenses of XML being the new and paradigmatic standard to share data even though no study has proved that the XML languages have been used by researchers or managers working with data-centric documents during this period of time. This thesis by researching all possible spaces proves that, on the contrary, that after more than a decade from its invention, XML is still not used by the vast scientific community that works with data-centric documents who are still using data archives with legacy formats. Therefore, if data standardization is difficult to attain, and facilitating sharing data a goal to reach, the other clear venue to follow to achieve the goal is to use metadata information. However, metadata languages such as the Ecological Modeling Language (EML) and others have no intrinsic features to complete and directly describe the information conveyed in many important types of data-centric documents used by environmentalists. By carefully studying the nature of data-centric archives and the process of metadata creation, this thesis shows that any data archive can be easily described using an "a posteriori" approach where the lexical descriptors of the physical data from a data-centric file are developed by inspection of the file instead of by following the specifications of the format of the file. In addition, following the principles of the Linked Open Data project, the lexical tree is mapped into a simple logic model with semantic annotations from controlled vocabularies which can be easily serialized for data exchange or data syndicalization. With the metadata extensions researched in this thesis, metadata languages such as EML can be improved by increasing its expression power. Environmental scientists and researchers can us this to exchange data-centric documents, and multidisciplinary projects can easily syndicate data from different authors in different formats in a data-centric cloud.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View