Citing Data

Current publishing practice is to cite data sources in much the same manner that articles and books are cited. Such citations may be part of a regular reference list or listed separately in their own list. The Force11 (Joint Declaration of Data Citation Principles) states (among other principles) that:
  • Data should be considered legitimate, citable products of research.
  • Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.
  • In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited.
  • A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.
  • Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version, and/or granular portion of data retrieved subsequently is the same as was originally cited.
The JATS citation models are adequate to record most current practice in citing data. However, data sets, protein sequences, and spreadsheets (to name a few data examples) are not tagged as uniformly by the industry as are cited journals and books. Specific JATS elements that can assist in preserving data source information in a citation include:
  • <data-title> — the formal title or name of a cited data source (or a component of a cited data source) such as a dataset or protein structure.
    Since datasets can contain very complex relationships for citing data, both the <source> element and the <data-title> element may be needed within a single citation to describe different levels of the data source. The <data-title> is typically used as an equivalent of an article title (<article-title>). See samples below.
  • <version> — A full version statement, which may be only a number, for data or software that is cited or described.
    The content of this element may be a simple version number (such as “<version>16</version>” or “<version>XII</version>”). More complex version statements may contain a textual statement including dates that the dataset covers. Whether or not the content is more than a simple number, the @designator attribute of this element can be used to hold the simple numerical or alphabetic version number, if there is such a number (<version designator="16.2">16th version, second release</version>).
We would like to thank the Force11 group for the data citation examples given below.
Protein Data Bank in Europe sample:
...
<ref>
<mixed-citation publication-type="data">Kollman JM, Charles EJ, Hansen JM, 
<year iso-8601-date="2014">2014</year>, <data-title>Cryo-EM structure of 
the CTP synthetase filament</data-title>, <ext-link ext-link-type="uri" 
xlink:href="https://www.ebi.ac.uk/pdbe/entry/EMD-2700">
http://www.ebi.ac.uk/pdbe/entry/EMD-2700</ext-link>, Publicly available 
from <source>The Electron Microscopy Data Bank (EMDB)</source>.</mixed-citation>
</ref>
...
GigaScience sample:
...
<ref>
<mixed-citation publication-type="data">Zheng LY, 
Guo XS, He B, Sun LJ, Pi CM, Jing H-C: Genome data from 
[<ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5524/100012">
http://dx.doi.org/10.5524/100012</ext-link>] <source>GigaScience</source> 
<year iso-8601-date="2011">2011</year>.</mixed-citation>
</ref>
...
Data in figshare, referenced through a DOI:
...
<ref>
<mixed-citation publication-type="data">Di Stefano B, Collombet S, 
Graf T. <source>Figshare</source> <ext-link ext-link-type="uri" 
xlink:href="https://dx.doi.org/10.6084/m9.figshare.939408">
http://dx.doi.org/10.6084/m9.figshare.939408</ext-link> 
(<year iso-8601-date="2014">2014</year>).</mixed-citation>
</ref>
...
Dryad Digital Repository, referenced through a DOI:
...
<ref>
<mixed-citation publication-type="data">Dubuis JO, Samanta R, 
Gregor T (<year iso-8601-date="2013">2013</year>).  Data from: 
<data-title>Accurate measurements of dynamics and reproducibility 
in small genetic networks</data-title>. <source>Dryad Digital 
Repository</source> doi:<pub-id pub-id-type="doi">10.5061/dryad.35h8v</pub-id>
</mixed-citation>
</ref>
...
GenBank Protein sample:
...
<ref>
<mixed-citation publication-type="data">
<data-title>Homo sapiens cAMP responsive element binding protein 1 
(CREB1), transcript variant A, mRNA</data-title>. <source>GenBank</source> 
<ext-link ext-link-type="genbank" xlink:href="NM_004379.3">NM_004379.3</ext-link>.
</mixed-citation>
</ref>
...
RNA Sequence sample:
...
<ref>
<mixed-citation publication-type="data">Xu, J. <etal/> 
<data-title>Cross-platform ultradeep transcriptomic profiling 
of human reference RNA samples by RNA-Seq</data-title>. 
<source>Sci. Data</source> <volume>1</volume>:<elocation-id>140020</elocation-id> 
doi: <pub-id pub-id-type="doi">10.1038/sdata.2014.20</pub-id> 
(<year iso-8601-date="2014">2014</year>).</mixed-citation>
</ref>
...