The lack of clear references to and descriptions of data sets in published literature limits the usefulness of data, as well as the reproducibility and credibility of scientific findings.
Scientific researchers are instructed from the very beginning of their training about the importance of citing previously published literature and carefully documenting methods. We are taught that this corroborating information forms a solid foundation on which to rest our claims and conclusions. Citing data sets in the same manner, however, is another story. Although researchers have enthusiastically embraced digital data archives that can be shared and updated easily, they do not always cite data sets from these resources in their publications in ways that facilitate verification and replication of their results—or the assembly of metrics to gauge how the data sets are used.
We need a renewed approach to promote, educate about, and enforce citation of data sets in manuscripts—and to improve the specificity with which they are described.
Benefits of data set citation include improved reproducibility (particularly when the exact version of data used is indicated) and credibility of research, and clarification about the provenance and use of—and the proper credit for—data. Readers of scientific literature, including researchers, funding agencies, and promotional committees, rely on data set citation for information about data set usage. These metrics are extremely important in assessing the impact of a given body of work and of the facilities that publish and deliver data to the scientific community. Improper, incorrect, and incomplete data set citation hinders such assessments.
Publishers and data repositories have made significant progress over the past decade in increasing awareness of data set citation, and science policy bodies such as AGU have recommended the practice. Despite extensive awareness efforts by such groups, however, we observe a shortage of clear references to cited data in published scientific literature. We need a renewed approach to promote, educate about, and enforce citation of data sets in manuscripts—and to improve the specificity with which they are described.