Tagging Data Availability Statements

Information on the availability of the data that forms the basis for an article can appear in several locations within the article:
  • Data Availability Statements (DAS): DAS are typically published as one or more sections of an article and describe where the data supporting the results of the article can be found. Such statements are becoming more frequent in journal publishing and are required by some publishers and/or funders.
  • Bibliographic Citations: Full, structured citations to any externally archived referenced data may be in the ordinary bibliographic References List or separated in a Reference List of their own. Such lists may appear in the references at the end of an article or within a Data Availability Statement.

Best Practice for Data Availability Statements

[Note: We thank the Data Availability Statement subgroup of the Force11 Data citations principles working group as well as selected publishers for the specific recommendations for the 2018 Best Practices mentioned below.]
  • A Data Availability Statement should be provided both for data that have been integrated into the article (in tables, figures, as supplementary material, etc.) and for data that are stored externally.
  • DAS information should be included in a part of the article that is available for all to read, not behind access controls (paywalls).
  • Fully tagged citations, typically in a Reference List, should be provided for all referenced data (externally archived generated data and analyzed data, as well as non-analyzed data).
  • Publicly available datasets should be referenced in the DAS and should also be cited in Reference Lists, particularly when the datasets have been assigned DOIs.
  • Data that were not analyzed should not be mentioned in a DAS.
  • For any data that were not provided within the paper or its supplementary material, the DAS should provide access to the external data locations. These locations may be tagged either as direct links, or as cross references to the appropriate Reference List(s).

Possible Content of a DAS

A Data Availability Statement might state, for example:
  • No datasets were analyzed or generated during the creation of the article.
  • All data analyzed or generated in the course of producing the article were included in the published article or its supplementary material.
  • Any restrictions or licensing that apply to the availability of the data, both for the current study and for anyone else wishing to view or reuse the data.
  • If the data is not included in the paper, the name/contact information for the author(s), a repository, or a third party where the data can be obtained.
  • The reasons why any generated or analyzed data is not publicly available.

DAS in JATS

Data Availability Statements in JATS documents should be tagged as ordinary sections (<sec>), using the @sec-type attribute to indicate that the section contains data availability material:
<sec sec-type="data-availability">
Since a Data Availability Statement is just an ordinary section, the location for a DAS is not restricted to any specific location in the document, but JATS Best Practice is to place such material in a location to which access is not restricted, that is, in a part of the article that is not behind access controls such as a pay wall.

Citing Data Sources

Current publishing Best Practice is to cite data sources in much the same manner that articles and books are cited, either as part of a regular Reference List or listed separately in their own data bibliographic list. Data citations should include sufficient information to “facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific time slice, version, and/or granular portion of data retrieved subsequently is the same as was originally cited.”
How the Data Was Used: For the purposes of citing data sources, three uses of the data associated with an article can be recognized:
  • Generated data: Included or referenced external data that were generated in the course of the study on which the article reports.
  • Analyzed data: Referenced data that were analyzed in the course of the study on which the article reports, but that were not generated for the study. This may include publicly available datasets.
  • Non-analyzed Data: Referenced data that were neither generated nor analyzed during the study.
Attributes for Data Citations: Two attributes may be used to enhance the discoverability of data citations:
  • The @publication-type attribute on the citation element (<mixed-citation> or <element-citation>) should be “data”.
  • The @use-type attribute (again on either <mixed-citation> or <element-citation>) may explain how the data were used in the research that led to the article, for example, for distinguishing between: “generated-data”, “analyzed-data”, and “non-analyzed-data” (referenced data).
    [In current practice, exactly how the data were used is probably material that only contributors can supply. Publishers and archives may have no reliable way to determine use, as there is typically nothing in the text that states usage.]

Samples of Data Availability Statements

Example 1: The Data Availability Statement reports that all data are provided either in the paper or in supplementary material. All @rids point to objects in the paper (tables and descriptions of supplementary material).
<back>
...
<sec sec-type="data-availability">
<title>Data Availability</title>
<p>All data are provided in <xref ref-type="table" rid="table1">Table 1</xref> 
and Datasets <xref ref-type="supplementary-material" rid="data1">S1</xref> 
and <xref ref-type="supplementary-material" rid="data2">S2</xref>.</p>
</sec>
<ref-list>...</ref-list>
</back>
Example 2: The Data Availability Statement reports both analyzed and generated data for which links are given for both external sources and internal cross-references to the full data citations in the Reference List.
...
<back>
...
<sec sec-type="data-availability">
<title>Data Availability</title>
<p>The data analysis file and all annotator data files are available in 
the Figshare repository, 
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.1285515">
https://doi.org/10.6084/m9.figshare.1285515</ext-link> 
[<xref ref-type="bibr" rid="pone.0167292.ref032">32</xref>]. The measured 
and simulated Euler angles, and the simulation codes are available from 
the Dryad database, 
<ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5061/dryad.cv323">
https://doi.org/10.5061/dryad.cv323</ext-link> 
[<xref ref-type="bibr" rid="pone.0167292.ref033">33</xref>]. Microarray 
data are deposited in the Gene Expression Omnibus under accession number 
<ext-link ext-link-type="uri" 
xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70542">GSE70542</ext-link> 
[<xref ref-type="bibr" rid="pone.0167292.ref034">34</xref>].</p>
</sec>
<ref-list>
<title>References</title>
...
</ref-list>
</back>
...
Example 3: Generated data have been mentioned in the Data Availability Statement section, and all data have also been referenced in the References List within that section.
...
<back>
...
<sec sec-type="data-availability">
<title>Data Availability</title>
<p>The following datasets were generated or analyzed for this study:</p>
<ref-list>
<ref id="pone.0167830.data001">
   <label>D1</label>
   <element-citation publication-type="data" 
     specific-use="isSupplementedBy">
     <name><surname>Read</surname><given-names>K</given-names></name>
     <data-title>Sizing the Problem of Improving Discovery and Access
       to NIH-funded Data: A Preliminary Study (Datasets)</data-title>
     <source>Figshare</source><year iso-8601-date="2015">2015</year>
     <pub-id pub-id-type="doi" assigning-authority="figshare"
        xlink:href="https://doi.org/10.6084/m9.figshare.1285515">
        https://doi.org/10.6084/m9.figshare.1285515</pub-id>
   </element-citation>
</ref>
    
<ref id="pone.0167830.data002">
   <label>D2</label>
   <element-citation publication-type="data" 
     specific-use="references">
     <name><surname>Kok</surname><given-names>K</given-names></name>
     <name><surname>Ay</surname><given-names>A</given-names></name>
     <name><surname>Li</surname><given-names>L</given-names></name>
     <data-title>Genome-wide errant targeting by Hairy</data-title>
     <source>Dryad Digital Repository</source>
     <year iso-8601-date="2015">2015</year>
     <pub-id pub-id-type="doi" assigning-authority="dryad"
       xlink:href="https://doi.org/10.5061/dryad.cv323">
        https://doi.org/10.5061/dryad.cv323</pub-id>
   </element-citation>
</ref>
    
<ref id="pone.0167830.data003">
   <label>D3</label>
   <element-citation publication-type="data" 
     specific-use="references">
     <name><surname>Hoang</surname><given-names>C</given-names></name>
     <name><surname>Swift</surname><given-names>GH</given-names></name>
     <name><surname>Azevedo-Pouly</surname><given-names>A</given-names>
       </name>
     <name><surname>MacDonald</surname><given-names>RJ</given-names></name>
     <data-title>Effects on the transcriptome of adult mouse pancreas
        (principally acinar cells) by the inactivation of the Ptf1a gene 
        in vivo</data-title>
     <source>NCBI Gene Expression Omnibus</source>
     <year iso-8601-date="2015">2015</year>
     <pub-id pub-id-type="accession" 
        assigning-authority="NCBI"
        xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70542"
        >GSE70542</pub-id>
   </element-citation>
</ref>
</ref-list>
</sec>
</back>
...

Example 4: The Data Availability Statement reports both analyzed and generated data (identified as such using a @use-type attribute), and all data are referenced in the References List within the DAS section.
4.1 Data tagged in the <ref-list> using a mixed citation:
...
<back>
...
<sec sec-type="data-availability">
<title>Data Availability</title>
<p>The following datasets were generated or analyzed for this study:</p>
<ref-list>
<ref id="m235">
<label>D1</label>
<mixed-citation publication-type="data" 
  specific-use="isSupplementedBy" use-type="generated-data">
<string-name><surname>Cates</surname> <given-names>KL</given-names></string-name>,
 <string-name><surname>Harris</surname> <given-names>LR</given-names></string-name>. 
 <data-title>Margins for Errant Targeting of Breast Cancer Incidence 
 among Senior Women</data-title>. <source>Figshare</source>. 
 <year iso-8601-date="2012">2012</year>. 
 <pub-id pub-id-type="doi" assigning-authority="figshare"
   xlink:href="https://doi.org/10.6084/m8.figshare.1365525">View Data</pub-id>
</mixed-citation>
</ref>
    
<ref id="m782">
<label>D2</label>
<mixed-citation publication-type="data" 
  specific-use="references" use-type="analyzed-data">
<string-name><surname>Peniston</surname> <given-names>S</given-names></string-name>, 
 <string-name><surname>Bunch</surname> <given-names>DB</given-names></string-name>, 
 <string-name><surname>Settles</surname> <given-names>LT</given-names></string-name>.
 <data-title>Targeting Genome Instability in Cancer (Datasets)</data-title>. 
 <source>Fayette Digital Archive</source>. <year iso-8601-date="2014">2014</year>. 
 <pub-id pub-id-type="doi" assigning-authority="fayette"
   xlink:href="https://doi.org/10.3271/fayette.cz389">View Data</pub-id>
</mixed-citation>
</ref>
</ref-list>
</sec>
</back>
...

4.2 Data tagged in the <ref-list> using an element citation:
...
<back>
...
<sec sec-type="data-availability">
<title>Data Availability</title>
<p>The following datasets were generated or analyzed for this study:</p>
<ref-list>
<ref id="m235">
<label>D1</label>
<element-citation publication-type="data" 
  specific-use="isSupplementedBy" use-type="generated-data">
<name><surname>Cates</surname><given-names>KL</given-names></name>
<name><surname>Harris</surname><given-names>LR</given-names></name> 
<data-title>Margins for Errant Targeting of Breast Cancer Incidence 
among Senior Women</data-title> 
<source>Figshare</source> 
<year iso-8601-date="2012">2012</year> 
<pub-id pub-id-type="doi" assigning-authority="figshare"
  xlink:href="https://doi.org/10.6084/m8.figshare.1365525">
https://doi.org/10.6084/m8.figshare.1365525</pub-id>
</element-citation>
</ref>
    
<ref id="m782">
<label>D2</label>
<element-citation publication-type="data" 
  specific-use="references" use-type="analyzed-data">
<name><surname>Peniston</surname><given-names>S</given-names></name> 
<name><surname>Bunch</surname><given-names>DB</given-names></name> 
<name><surname>Settles</surname><given-names>LT</given-names></name>
<data-title>Targeting Genome Instability in Cancer (Datasets)</data-title> 
<source>Fayette Digital Archive</source>
<year iso-8601-date="2014">2014</year> 
<pub-id pub-id-type="doi" assigning-authority="fayette"
  xlink:href="https://doi.org/10.3271/fayette.cz389">
https://doi.org/10.3271/fayette.cz389</pub-id>
</element-citation>
</ref>
</ref-list>
</sec>
</back>
...

Example 5: Analyzed data cannot be made publicly available:
...
<back>
...
<sec sec-type="data-availability">
<title>Data Availability</title>
<p>Ethical restrictions according to the Japanese Ethical 
Guidelines for Human Genome/Gene Analysis Research (<ext-link 
ext-link-type="uri" 
xlink:href="http://www.lifescience.mext.go.jp/files/pdf/n796_00.pdf">
http://www.lifescience.mext.go.jp/files/pdf/n796_00.pdf</ext-link>, page
33) prevent public sharing of individual genotype data. All summarized
data are available upon request. Data requests may be sent to the 
UMIN IRB (<email>irb@xxxxxxxx.jp</email>).</p>
</sec>
...
</back>
...

Example 6: No data was analyzed or generated:
<back>
...
<sec sec-type="data-availability">
<title>Data Availability</title>
<p>During the course of this research no data was analyzed, 
reused or generated.</p>
</sec>
<ref-list>
<title>References</title>
...
</ref-list>
</back>