Tagging Data Availability Statements
Information on the availability of the data that forms the basis for an
article can appear in several locations within the article:
- Data Availability Statements (DAS): DAS are typically published as one or more sections of an article and describe where the data supporting the results of the article can be found. Such statements are becoming more frequent in journal publishing and are required by some publishers and/or funders.
- Bibliographic Citations: Full, structured citations to any externally archived referenced data may be in the ordinary bibliographic References List or separated in a Reference List of their own. Such lists may appear in the references at the end of an article or within a Data Availability Statement.
Best Practice for Data Availability Statements
[Note: We thank the Data Availability Statement subgroup of the Force11 Data
citations principles working group as well as selected publishers for the
specific recommendations for the 2018 Best Practices mentioned below.]
- A Data Availability Statement should be provided both for data that have been integrated into the article (in tables, figures, as supplementary material, etc.) and for data that are stored externally.
- DAS information should be included in a part of the article that is available for all to read, not behind access controls (paywalls).
- Fully tagged citations, typically in a Reference List, should be provided for all referenced data (externally archived generated data and analyzed data, as well as non-analyzed data).
- Publicly available datasets should be referenced in the DAS and should also be cited in Reference Lists, particularly when the datasets have been assigned DOIs.
- Data that were not analyzed should not be mentioned in a DAS.
- For any data that are not provided within the paper or its supplementary material, the DAS should provide access to the external data locations. These locations may be tagged either as direct links, or as cross references to the appropriate Reference List(s).
Possible Content of a DAS
A Data Availability Statement might state, for example:
- No datasets were analyzed or generated during the creation of the article.
- All data analyzed or generated in the course of producing this article are included in this published article or its supplementary material.
- Any restrictions or licensing that apply to the availability of the data, both for the current study and for anyone else wishing to view or reuse the data.
- If the data is not included in the paper, the name/contact information for the author(s), a repository, or a third party where the data can be obtained.
- The reasons why any generated or analyzed data is not publicly available.
DAS in JATS
A Data Availability Statement in JATS documents should be tagged as an ordinary section (<sec>), using the @sec-type attribute
to indicate that the section contains data availability material:
<sec sec-type="data-availability">
Since a Data Availability
Statement is just a section, the location for a DAS is not restricted to any specific location in the document. JATS Best Practice is to place such material
in a location to which access is not restricted, that is, in a part of the article that is
not behind access controls such as a pay wall.
Citing Data Sources
Current publishing Best Practice is to cite data sources in
much the same manner that articles and books are cited, either as part of a regular
Reference List or listed
separately in their own data bibliographic list. Data citations
should include sufficient information to
“facilitate identification of, access to, and verification of the
specific data that support a claim. Citations or citation metadata should include information about
provenance and fixity sufficient to facilitate verifying that the specific time slice, version,
and/or granular portion of data retrieved subsequently is the same as was
originally cited.”
How the Data Was Used:
For the purposes of citing data sources, three
uses of the data associated with an article can be recognized:
- Generated data: Included or referenced external data that were generated in the course of the study on which the article reports.
- Analyzed data: Referenced data that were analyzed in the course of the study on which the article reports, but that were not generated for the study. This may include publicly available datasets.
- Non-analyzed Data: Referenced data that were neither generated nor analyzed during the study.
Attributes for Data Citations: Two attributes may be used to
enhance the discoverability of data citations:
- The @publication-type attribute on the citation element (<mixed-citation> or <element-citation>) should be “data”.
- The @use-type attribute (again on either
<mixed-citation> or <element-citation>)
may explain how the data were used in the research that led to the
article, for example, for distinguishing between:
“generated-data”, “analyzed-data”,
and “non-analyzed-data” (referenced data).[In current practice, exactly how the data were used is probably material that only contributors can supply. Publishers and archives may have no reliable way to determine use, as there is typically nothing in the text that states usage.]
Samples of Data Availability Statements
Example 1: The Data Availability Statement reports that all data are provided either
in the paper or in supplementary material. All @rids point
to objects in the paper (tables and descriptions of supplementary material).
<back> ... <sec sec-type="data-availability"> <title>Data Availability</title> <p>All data are provided in <xref ref-type="table" rid="table1">Table 1</xref> and Datasets <xref ref-type="supplementary-material" rid="data1">S1</xref> and <xref ref-type="supplementary-material" rid="data2">S2</xref>.</p> </sec> <ref-list>...</ref-list> </back>
Example 2: The Data Availability Statement reports
both analyzed and generated data for which links are given for both external
sources and internal cross-references to the full data citations in
the Reference List.
... <back> ... <sec sec-type="data-availability"> <title>Data Availability</title> <p>The data analysis file and all annotator data files are available in the Figshare repository, <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.1285515"> https://doi.org/10.6084/m9.figshare.1285515</ext-link> [<xref ref-type="bibr" rid="pone.0167292.ref032">32</xref>]. The measured and simulated Euler angles, and the simulation codes are available from the Dryad database, <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5061/dryad.cv323"> https://doi.org/10.5061/dryad.cv323</ext-link> [<xref ref-type="bibr" rid="pone.0167292.ref033">33</xref>]. Microarray data are deposited in the Gene Expression Omnibus under accession number <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70542">GSE70542</ext-link> [<xref ref-type="bibr" rid="pone.0167292.ref034">34</xref>].</p> </sec> <ref-list> <title>References</title> ... </ref-list> </back> ...
Example 3: Generated data are mentioned in the Data
Availability Statement section, and all data are referenced
in the References List within that section.
... <back> ... <sec sec-type="data-availability"> <title>Data Availability</title> <p>The following datasets were generated or analyzed for this study:</p> <ref-list> <ref id="pone.0167830.data001"> <label>D1</label> <element-citation publication-type="data" specific-use="isSupplementedBy"> <name><surname>Read</surname><given-names>K</given-names></name> <data-title>Sizing the Problem of Improving Discovery and Access to NIH-funded Data: A Preliminary Study (Datasets)</data-title> <source>Figshare</source><year iso-8601-date="2015">2015</year> <pub-id pub-id-type="doi" assigning-authority="figshare" xlink:href="https://doi.org/10.6084/m9.figshare.1285515"> https://doi.org/10.6084/m9.figshare.1285515</pub-id> </element-citation> </ref> <ref id="pone.0167830.data002"> <label>D2</label> <element-citation publication-type="data" specific-use="references"> <name><surname>Kok</surname><given-names>K</given-names></name> <name><surname>Ay</surname><given-names>A</given-names></name> <name><surname>Li</surname><given-names>L</given-names></name> <data-title>Genome-wide errant targeting by Hairy</data-title> <source>Dryad Digital Repository</source> <year iso-8601-date="2015">2015</year> <pub-id pub-id-type="doi" assigning-authority="dryad" xlink:href="https://doi.org/10.5061/dryad.cv323"> https://doi.org/10.5061/dryad.cv323</pub-id> </element-citation> </ref> <ref id="pone.0167830.data003"> <label>D3</label> <element-citation publication-type="data" specific-use="references"> <name><surname>Hoang</surname><given-names>C</given-names></name> <name><surname>Swift</surname><given-names>GH</given-names></name> <name><surname>Azevedo-Pouly</surname><given-names>A</given-names> </name> <name><surname>MacDonald</surname><given-names>RJ</given-names></name> <data-title>Effects on the transcriptome of adult mouse pancreas (principally acinar cells) by the inactivation of the Ptf1a gene in vivo</data-title> <source>NCBI Gene Expression Omnibus</source> <year iso-8601-date="2015">2015</year> <pub-id pub-id-type="accession" assigning-authority="NCBI" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE70542" >GSE70542</pub-id> </element-citation> </ref> </ref-list> </sec> </back> ...
Example 4: The Data Availability Statement reports
both analyzed and generated data (identified as such using a @use-type attribute), and all data are referenced
in the References List within the DAS section.
4.1 Data tagged in the <ref-list> using a mixed citation:
Mixed citations used in the <ref-list>:
... <back> ... <sec sec-type="data-availability"> <title>Data Availability</title> <p>The following datasets were generated or analyzed for this study:</p> <ref-list> <ref id="m235"> <label>D1</label> <mixed-citation publication-type="data" specific-use="isSupplementedBy" use-type="generated-data"> <string-name><surname>Cates</surname> <given-names>KL</given-names></string-name>, <string-name><surname>Harris</surname> <given-names>LR</given-names></string-name>. <data-title>Margins for Errant Targeting of Breast Cancer Incidence among Senior Women</data-title>. <source>Figshare</source>. <year iso-8601-date="2012">2012</year>. <pub-id pub-id-type="doi" assigning-authority="figshare" xlink:href="https://doi.org/10.6084/m8.figshare.1365525">View Data</pub-id> </mixed-citation> </ref> <ref id="m782"> <label>D2</label> <mixed-citation publication-type="data" specific-use="references" use-type="analyzed-data"> <string-name><surname>Peniston</surname> <given-names>S</given-names></string-name>, <string-name><surname>Bunch</surname> <given-names>DB</given-names></string-name>, <string-name><surname>Settles</surname> <given-names>LT</given-names></string-name>. <data-title>Targeting Genome Instability in Cancer (Datasets)</data-title>. <source>Fayette Digital Archive</source>. <year iso-8601-date="2014">2014</year>. <pub-id pub-id-type="doi" assigning-authority="fayette" xlink:href="https://doi.org/10.3271/fayette.cz389">View Data</pub-id> </mixed-citation> </ref> </ref-list> </sec> </back> ...
4.2 Data tagged in the <ref-list> using an element citation:
... <back> ... <sec sec-type="data-availability"> <title>Data Availability</title> <p>The following datasets were generated or analyzed for this study:</p> <ref-list> <ref id="m235"> <label>D1</label> <element-citation publication-type="data" specific-use="isSupplementedBy" use-type="generated-data"> <name><surname>Cates</surname><given-names>KL</given-names></name> <name><surname>Harris</surname><given-names>LR</given-names></name> <data-title>Margins for Errant Targeting of Breast Cancer Incidence among Senior Women</data-title> <source>Figshare</source> <year iso-8601-date="2012">2012</year> <pub-id pub-id-type="doi" assigning-authority="figshare" xlink:href="https://doi.org/10.6084/m8.figshare.1365525"> https://doi.org/10.6084/m8.figshare.1365525</pub-id> </element-citation> </ref> <ref id="m782"> <label>D2</label> <element-citation publication-type="data" specific-use="references" use-type="analyzed-data"> <name><surname>Peniston</surname><given-names>S</given-names></name> <name><surname>Bunch</surname><given-names>DB</given-names></name> <name><surname>Settles</surname><given-names>LT</given-names></name> <data-title>Targeting Genome Instability in Cancer (Datasets)</data-title> <source>Fayette Digital Archive</source> <year iso-8601-date="2014">2014</year> <pub-id pub-id-type="doi" assigning-authority="fayette" xlink:href="https://doi.org/10.3271/fayette.cz389"> https://doi.org/10.3271/fayette.cz389</pub-id> </element-citation> </ref> </ref-list> </sec> </back> ...
Example 5: Analyzed data cannot be made publicly available:
... <back> ... <sec sec-type="data-availability"> <title>Data Availability</title> <p>Ethical restrictions according to the Japanese Ethical Guidelines for Human Genome/Gene Analysis Research (<ext-link ext-link-type="uri" xlink:href="http://www.lifescience.mext.go.jp/files/pdf/n796_00.pdf"> http://www.lifescience.mext.go.jp/files/pdf/n796_00.pdf</ext-link>, page 33) prevent public sharing of individual genotype data. All summarized data are available upon request. Data requests may be sent to the UMIN IRB (<email>irb@xxxxxxxx.jp</email>).</p> </sec> ... </back> ...
Example 6: No data was analyzed or generated:
<back> ... <sec sec-type="data-availability"> <title>Data Availability</title> <p>During the course of this research no data was analyzed, reused or generated.</p> </sec> <ref-list> <title>References</title> ... </ref-list> </back>