Alignment of biomedical data repositories with open, FAIR, citable and trust-worthy principles

Increasing attention is being paid to the operation of biomedical data repositories in light of efforts to improve how scientific data is handled and made available for the long term. Simultaneously, groups around the world have been coming together to formalize principles that govern different aspects of open science and data sharing.The most well known are the FAIR data principles. These are joined by principles and practices that govern openness, citation, credit and good stewardship (trustworthiness). Together, these define a framework for data repositories to support Open, FAIR, Citable and Trustworthy (OFCT) data. Here we developed an instrument using the open source PolicyModels toolkit that attempts to operationalize key aspects of OFCT principles and applied the instrument to eight biomedical community repositories listed by the NIDDK Information Network (dkNET.org). The evaluation was performed through inspection of documentation and interaction with the sites. Overall, there was little explicit acknowledgement of any of the OFCT principles, although the majority of repositories provided at least some support for their tenets.


Introduction
Best practices emerging from the open science movement emphasize that for data to be effectively shared, they are to be treated as works of scholarship that can be reliably found, accessed, reused and credited. To achieve these functions, the open science movement has recommended that researchers formally publish their data by submitting them to a data repository (OpenAire 2020), which assumes stewardship of the data and ensures that data are made FAIR: Findable, Accessible, Interoperable and Reusable (Wilkinson et al., 2016). Pub-lishing data can therefore be seen as equivalent to publishing narrative works in that the locus of responsibility for stewardship transfers from the researcher to other entities, who ensure consistent metadata, future-friendly formats, stable and reliable access, long term availability, indexing and tools for crediting the contributors. As these types of responsibilities are traditionally supported by journals and libraries, it is not surprising that many publishers and libraries are now developing platforms for hosting research data. At the same time, data are not exactly the same as narrative works. They require additional functionality to increase their utility, which explains why the most well known scientific data repositories are led by individual researchers, research communities or funders. Scientific data repositories such as the Protein Data Bank (Berman et al. 2012) predated the internet and are viewed as important infrastructures for data harmonization, integration and computation.
Although there is general agreement that repositories should support FAIR data, there have been several other community-led initiatives to develop principles in support of open science and data sharing. The "Defining the Scholarly Commons" project at FORCE 11.org identified over 100 sets of principles issued by organizations and groups around the world that cover a range of activities involved in scholarship and how it should be conducted in the 21st century (Bosman et al., 2017). Common threads included: 1) the need to include not only narrative works, but data, code and workflows; 2) the desire to make these products "as open as possible; as closed as necessary"; 3) FAIRness, i.e., designing the products of scholarship so that they operate efficiently in a digital medium; 4) Citability, i.e., expanding our current citation systems to cover other research outputs like data, and 5) Trustworthiness, i.e., ensuring that those who assume responsibility for stewardship of scholarly output operate in the best interests of scholarship. In the imagined scholarly commons, data repositories were the central players that provided the human and technical infrastructure for making research data Open, FAIR, Citable and Trustworthy (OFCT).
In the work presented here, we developed an instrument to assess the current state of data repositories on behalf of the NIDDK Information Network (dkNET.org; (Whetzel et al. 2015)).
dkNET was established in 2012 to provide information and services to basic and clinical biomedical researchers for data and resources relevant to diabetes, digestive and kidney diseases (referred to here as "dk"). dkNET is taking an active role in interpreting and facilitating compliance with FAIR on behalf of this community. Part of this effort involves creating tools to help researchers select an appropriate repository for their data. As a first step, dkNET created a listing of data repositories that cover domains relevant to dk science as listed on dkNET's own website. As a second step, we wanted to evaluate how well these repositories supported current trends in open science. We therefore developed an instrument that allowed us to gauge repositories' alignment with OFCT principles.

Method
We developed a set of 31 questions (   The table shows the question order and  ID ( Q#,id), the text of the question posed in the interview (Question text), possible answers (Answers), whether or not the question is conditional ("C"), the dependencies of conditional questions (D) and the principle(s) the question is meant to cover (P). A "Y" in the conditional column indicates that whether or not the question is shown to the interviewer depends upon a prior answer. The questions that elicit the conditional questions are shown in the Dependencies column. Y=Yes, N=No, O=Open, F=FAIR, C=Citable, T=Trustworthy. The full instrument, which also includes explanatory text and appropriate links, is available at Martone et al., 2020. The instrument was used to evaluate eight repositories listed by dkNET (RRID:SCR_001606) provided in Table 3. We selected these repositories to represent different data types or different research foci. Excluded from consideration were repositories that required an approved account to access the data, e.g., the NIDDK Central Repositories. We also did not con-

Developing and testing the instrument:
To design the instrument, we adapted the decision tree originally designed by the FORCE11 Scholarly Commons project for evaluating repositories on OFCT principles (Bosman et al., 2017). We benchmarked the instrument against a range of surveys and other tools then available for similar uses. These included the repository finder tool developed by DataCite for the Enabling FAIR Data project; the Scientific Data journal repository questionnaire; the FAIRsFAIR data assessment tool; and the Core Trustworthy Data Requirements. From this exercise, we determined that the answers to the questions were sometimes difficult to ascertain as clear criteria for evaluation had not been specified. Some areas were clearly missing while some of the questions were duplicative. We thus modified the questionnaire by removing duplicates, adding additional questions, developing specific evaluation criteria and adding tips as to where to look for certain types of information. Definitions and links to supporting materials were also provided for each question where appropriate. The complete version of the questionnaire used here, which includes the criteria used for each question, was deposited in Zenodo (Martone, Murphy, and Bar-Sinai 2020) The final questionnaire comprised 31 questions, listed in order in Table 2. Some of the questions are conditional, that is, their presentation is dependent upon a prior answer. For example, if an interviewer answered "No" to question lic-clr, "Does the repository provide a clear license for reuse of the data?" then question lic-cc "Are the data covered by a commons-com- , open] license?" is not presented. Thus, the total number of questions asked may differ across repositories.  Table 2, such as Documentation Level (lacking/adequate/good/full), Metadata Provenance (unclear/adequate/full), and overall ratings of each criteria, e.g., FAIR Accessibility level (none/partial/full) and so forth. The full policy space for this instrument is shown in Figure 1, and is also available via the questionnaire landing page and in Martone et al., (2020). Some dimensions are assigned based on the answer to a single question, while some are calculated based on values on other dimensions. Using an interactive interview guided by our model's decision graph, we were able to find the location of each of the evaluated repositories in the space we defined. To visualize this space, we developed an interactive viewer available at http://mbarsinai.com/viz/dknet. This allowed us to formally compare repositories across multiple dimensions, and to collect overall statistics. The main features of the tool are shown in FIgure 2. The online version allows interviewers to annotate the response to each question with notes ( Figure 2B) and export the outcomes of the evaluation ( Figure 2G). Currently, the results can only be exported as .json or .xml.
However, to save a human readable version .pdf version of the questionnaire results, users can use the browser's print function to save the interview summary page as a PDF.

Scoring
Five of the sites were reviewed independently by FM and MM between March and May 2020 and three in December 2020. Results were compared and a final score assigned for each question. The reviewers made a good faith effort to find information on the site to provide an accurate answer for each question. The evaluation included checking of information on the D) The answer feed may be displayed and used to track progress and also to allow an interviewer to revisit a question to change an answer; E) Policy models tallies the answers and assigns tags assessing compliance with OFCT; F) Final tags assigned for each category; G) The results may be downloaded as json or xml.
repository site, examination of the metadata provided by the site, investigations into the PID system, including what information was exported to DataCite if DOIs were used, inspection of the underlying platform code, documentation and tutorials. For some of the repositories, we created accounts in order to evaluate practices and further documentation for uploading data, e.g, can one associate an ORCID with a dataset, although in no case did we actually upload any data. We did not attempt to read papers that described the site. If we could not find explicit evidence for a criterion, we assumed that it was not present. Therefore, a "No" answer to a question such as "Does the repository provide an API" could mean either that the repository has a statement saying that it will not provide an API, or that we could find no evidence that it did.
After a model-based interview regarding a given repository is completed, PolicyModels displays a coded evaluation of the repository. Formally, PolicyModels locates the coordinate that best describes that repository in our model's policy space. While mathematically all dimensions are equally important, PolicyModels allows its users to organize them hierarchically, to make working with them more comfortable.
Our proposed model's policy space is organized as follows. High-level property descriptions, such as openness and citability levels, are each represented in a dimension of their own.
These dimensions have three levels, corresponding to "not at all", "somewhat", and "fully".
For example, the Reusable dimension contains the levels "not reusable", "partially reusable", and "fully reusable".
The high-level properties are a summary of lower-level assertions, each describing a narrow aspect of these high-level properties. These assertions can be binary or detailed. For exam- ple, "open format", one of the openness sub-aspects, is "yes" for repositories that use an open format and "no" for the others. On the other hand, "Study Linkage", an interoperability sub-aspect, can be "none", "free text", "textual metadata", or "machine readable metadata".
Each interview starts by pessimistically setting all high-level dimensions to their lowest possible value: "not at all". During the interview, while lower-level aspect results are collected, high-level repository coordinates may be advanced to their corresponding "somewhat" levels.
After the last question, if the evaluated repository achieved an acceptable for all sub-aspects of a certain higher property, that property is advanced to its "fully" level.
As a concrete example, consider the "Findable" dimension. At the interview's start, we set it to "not findable". During the interview, our model collects results about persistent identifiers used by the repository (none/internal/external), the grade of the metadata it uses (minimal/ limited/rich), whether ids are stored in the metadata (none/partial/all), and whether the repository offers an internal search feature (yes/no). If a repository achieves the lowest values in all these dimensions, it maintains its "not findable" score. If it achieves at least one non-lowest value, it is advanced to "partially findable". After the interview is completed, if it achieved the highest value in each of these dimensions, it is advanced to "fully findable".

Data and Code Availability
The data outputs and completed questionnaires from the interview analysis are in Zenodo  Figure 3 provides the average score, scaled to a 10 point scale for each question, with 1 = lowest score and 10 = best score. A full list of question IDs is available in Table 3 and Supplemental Material S1. On over half of the questions (17/31), repositories scored on average higher than the midpoint, indicating at least some alignment. On just under half they were below (14/31), indicating poor alignment or no information available, with all repositories receiving the lowest score on 3 of the questions.  Table 4.

Overall impressions
Our instrument calculates an overall rating per OFCT dimension, as shown in Figure 4. For a repository to be rated fully compliant, it would have to receive an acceptable score for all dimensions that evaluate that principle; conversely to be rated non-compliant would require an unacceptable score on all dimensions. This calculation is performed using PolicyModels, and is based on the range of acceptable and unacceptable values in various dimensions of the instrument's policy space. Note that we do not provide scores for individual repositories in this paper, as our intent is not to grade them. However, the completed questionnaires for the individual repositories are available in (Michael Bar-Sinai, Murphy, and Martone 2020).
As seen in Figure 4, at least one repository scored as fully compliant in each of the Open, Findability, Accessibility, Reusability and Citability dimensions. Conversely, three repositories received the lowest rating for Findability and one for Citability. No single repository was equally good -or bad -on all dimensions, that is, the same repositories did not receive either all of the highest or lowest scores. The most flags assigned to a single repository was 15 while the fewest was 5.    Table 4. As biomedical repositories can deal with sensitive information that cannot be openly shared, they should adhere to the "As open as possible; as closed as necessary" principle. However, none of the repositories we evaluated had sensitive data and all were judged to make their data available with minimal to no restrictions, i.e., no approval process for accessing the data. We also evaluated repositories' policies against the open definition: "Knowledge is open if anyone is free to access, use, modify, and share it -subject, at most, to measures that preserve provenance and openness." Thus, data have to be available to anyone, including commercial entities, and users must be free to share them with others. We thus examined the licenses against those rated by the Open Knowledge Foundation as adhering to their definition (https://opendefinition.org/licenses/). One repository was considered fully compliant, 4 were rated as "good" with respect to open licenses, 3 had no licenses (Table 4; CCLicenseCompliance). The four rated as "good" did not receive the best score due to practices such as allowing the user to select from a range of licenses, some of which restricted commercial use.

FAIR dimension
Our questions on FAIR evaluated both compliance with specific FAIR criteria, e.g., the presence of a persistent identifier or with practices that support FAIR, e.g., providing landing pages and providing adequate documentation to promote reuse. Evaluating a repository FIgure 5: Assessment of the degree of descriptive metadata (X) vs relevant biomedical metadata (dkNET Metadata Level) (Y). The Metadata Grade assesses whether the repository complies with the Findable principle for Rich Metadata, while the dkNET metadata measures the degree to which the repository supports the Reusable principle requiring "a plurality of relevant attributes". Relevance here was assessed with respect to dkNET. Only one repository received the highest score for both categories.
against some principles also required that we define concepts such as "rich metadata" (FAIR principle F2) and a "plurality of relevant attributes" (FAIR principles R1).
Rich metadata were considered to comprise basic descriptive metadata, i.e., dataset title, description, authors but also metadata specific to biomedical data, e.g., organism, disease conditions studied and techniques employed (Q:md-level). "A plurality of relevant attributes" was defined in question md-dkn as providing sufficient metadata to understand the necessary context required to interpret a dkNET relevant biomedical dataset. Such metadata includes subject level attributes, e.g., ages, sex and weight along with detailed experimental protocols. Figure 5 positions each repository in the metadata policy space and shows that only one repository fully satisfied both metadata requirements. Figure 4 shows that the majority of repositories were either partially or fully compliant with all the Findability and Accessibility dimensions. Two repositories achieved the highest rating in Findability. Seven out of the 8 repositories supported external PIDs, either DOIs or accession numbers registered to identifiers.org. One repository issued no identifiers. Only 1 repository was considered fully accessible because only 1 repository had a clear persistence policy (Q:md-psst). Both the JDDCP and FAIR principles state that metadata should persist even if the accompanying data are removed. We considered either an explicit policy or clear evidence of such a practice as acceptable, e.g., a dataset that had been withdrawn but whose metadata remained.
Overall scores were lowest for the interoperability dimensions, with 3 repositories being judged non-interoperable. Only one of the repositories achieved the StudyLinkage flag which indicated that they had fully qualified references to other data, in other words, that the relationship between a metadata attribute and a value was both machine readable and informative. We measured this property by looking at how repositories handled supporting publica-tions in their metadata, e.g., did they specify the exact relationship between the publication and the dataset? To measure this, we looked at the web page markup ("view source") and also checked records in DataCite.
Two repositories achieved the highest score for reusability, while the remainder were considered partially reusable. Five repositories were judged as having inadequate metadata for providing experimental context, 4 as having inadequate user documentation, while 3 did not provide a clear license. Repositories plotted against two dimensions of data citation. The Y axis shows support for citation metadata and the X axis for ORCID support. Two repositories support ORCID and provide full citation metadata. Two repositories have no support for data citation and the others have partial support.

Citable dimension
Data citation criteria included the availability of full citation metadata and machine-readable citation metadata according to the JDDCP ( (Starr et al. 2015); (Fenner et al. 2019); (Cousijn et al. 2018)). We also evaluated the use of ORCIDs, as linking ORCIDs to datasets facilitates assigning credit to authors. As shown in Figure 5, only two repositories supported ORCID and provided full citation metadata. Consequently, 2 repositories were judged to fully support data citation, while the remainder were judged as partially (N=5) or not supporting (N=1) data citation. Many of the repositories had a citation policy, but most of these policies requested citation of a paper describing the repository and contributor of the data acknowledged rather than creating full citations of a particular dataset. Two were judged not to have sufficient metadata to support full citation, e.g., listing only the submitter and not other authors [see question med-daci].

Trustworthy dimension
Trustworthiness was largely assessed against the Principles of Open Infrastructures (Bilder et al., 2015) and the CoreTrustSeal criteria. The questionnaire originally probed the different certification criteria recommended by the CoreTrustSeal but we dropped this approach in favor of a single binary question on whether or not the repository was certified by CoreTrustSeal or equivalent. If a repository was certified, it would automatically be rated fully trustworthy.
However, none of the eight repositories provided evidence of such a certification.
In accordance with the Principles of Open Infrastructures, we measured the degree to which the governance of the repository was transparent and documented and whether the repository was stakeholder governed. Only one repository received the highest rating for each of these, while 1 had virtually no information on how the repository is governed, e.g., who is the owner of the repository, or how decisions are made. Although 6 of the repositories were re-

Discussion
As part of dkNET.org's efforts to promote data sharing and open science, we undertook an evaluation of current repositories supporting research domains of relevance to dkNET. Our ultimate goal is to provide tools to help researchers within these domains select an appropriate repository for their research data. Some of the data acquired with this instrument will be used to enhance dkNET's repository listings with information that might be important to a researcher when selecting a repository, e.g., does the repository support data citation. We also want to serve as a resource for those developing new dk data repositories by defining a set of important functions such repositories should support. More attention is now being paid in biomedicine to certification instruments such as the CoreTrustSeal, as evidenced by a newly released RFA for data repositories by the US National Institutes of Health (NIH 2020).
A good-faith effort was made to try to answer the questions accurately, although reviewing biomedical repositories is challenging. Each of the sites is organized differently and the specialized research repositories were developed to serve different communities and use cases.
Therefore, to evaluate specific dimensions required significant engagement with the site, even in some cases requiring us to establish accounts to see what metadata was gathered at time of upload. Discovery of these types of routes, e.g., that ORCIDs are only referenced when you establish an account, required us often to go back and re-evaluate the other repositories using this same method.
Only two of the repositories gave any indication that their functions or design were informed by any of the OFCT principles, specifically mentioning FAIR. The lack of explicit engagement with these principles is not surprising given that most of the repositories were established before these principles came into existence. For this reason, we gave credit for what we called "OFCT potential" rather than strict adherence to a given practice. We used a sliding scale for many questions that would assign partial credit. For example, if the repository did have landing pages at stable URLs we gave them some credit, even if the identifier was not strictly a PID. Such IDs could easily be turned into PIDs by registering them with a resolving service such as Identifiers.org or N2T.org (Wimalaratne et al. 2018).
In addition to finding relevant information, consistent scoring of the repository was also a challenge. Principles are designed to be aspirational and to provide enough flexibility that they will be applicable across multiple domains. There is therefore a certain amount of subjectivity in their evaluation particularly in the absence of validated, established standards.
For example, one of the repositories issued persistent identifiers at the project level but not to the data coming from the individual studies. In another website not included in the final evaluation sample, DOIs were available upon request. Are these considered compliant? One could argue both ways.
As described in the methods, we did not attempt to cover all aspects of the underlying principles, we selected those for which we could develop reasonable evaluation criteria. One very important issue covered by CoreTrustSeal, the newly published TRUST principles (Lin et al. 2020) and Principles of Open Infrastructure (Bilder, Lin, and Neylon 2015) is long term sustainability. Although critical, we do not think that an external party such as ourselves is in a posi-tion to comment on the long term sustainability plan for a given repository.  The FAIR Maturity Indicators and FAIRshake toolkits differ from ours in that they are intended to employ either fully automated or semi-automated approaches for determining FAIRness. As we show here, some aspects of FAIR require interpretation, e.g., "a plurality of relevant attributes", making it difficult to employ fully automated approaches. In the case of "rich metadata" and "plurality of relevant attributes", dkNET is evaluating these based on our criteria, that is, the type of metadata we think are critical for biomedical studies in our domain.
These may not be universal. On the other hand, automated tools for determining the level of machine readability for features such as landing pages would make evaluation much simpler than our current process. We will likely incorporate some of these tools into future versions of the instrument.
https://github.com/FAIRMetrics/Metrics 1 While evaluation tools can be powerful, there are downsides to rushing into too rigid an interpretation of OFCT. First, communities are still coming together to determine what constitutes OFCT for their constituents and what can be reasonably implemented at this time. As noted in the introduction, data repositories have to straddle two worlds: providing traditional publishing/library functions to ensure findability and stability, while at the same fulfilling more traditional roles of scientific infrastructures for harmonizing and reusing data. Thus, evaluating a repository from a journal's perspective may not be the same as from a researcher's perspective.
Second, ) analyzed different evaluation metrics for data repositories and found that although they agree on some dimensions, they don't agree on all. Based on their analysis, they have made specific recommendations as to the types of functions they should support and the information which should be available. Such results indicate that it is still perhaps early days for understanding what constitutes best practices for a data repository across all disciplines. Our understanding of such practices may evolve over time as data sharing becomes more mainstream. As already noted, for example, early efforts in data sharing necessarily focused on deposition of data. Less attention, perhaps, was paid to what it takes for the effective reuse of the data. While the FAIR principles emphasize machine-readable attributes for achieving reusability without human intervention, some studies suggest that the human factor may be more critical for some types of data (Faniel and Yakel, 2017). For these types, having a contact person and an accompanying publication makes it much easier to understand key contextual details (Faniel and Yakel 2017;Turner et al. 2011). As we start to see more reuse of data, it may be possible to employ more analytical methods for determining best practices based on actual use cases.
For these reasons, we deliberately refrained from assigning grades or calling out individual repositories in the work presented here. (Wilkinson et al. 2019) noted that many repositories which were evaluated early on using FAIRmetrics expressed resentment. We recognize the struggles that those who develop and host scientific data repositories undergo to keep the resource up and running, particularly in the face of uncertain funding. Generally, these repositories were founded to serve a particular community, and the community itself may not be demanding or engaging with OFCT principles. We therefore favor flexible approaches that allow individual communities to interpret OFCT within the norms of their community and not entirely according to the dictates of external evaluators. Nevertheless, research data repositories, after operating largely on their own to determine the best way to serve research data, are going to have to adapt to meet the challenges and opportunities of making research data a primary product of scientific research.