National Park Service Data Release Reports

Resources and Guides for generating NPS DRRs associated with data packages

Overview

Data Release Reports (DRRs) are created by the National Park Service and provide detailed descriptions of valuable research datasets, including the methods used to collect the data and technical analyses supporting the quality of the measurements. Data Release Reports focus on helping others reuse data, rather than presenting results, testing hypotheses, or presenting new interpretations, methods or in-depth analyses.

DRRs are intended to document the processing of fully-QAed data to their final (QCed) form in a reproducible and transparent manner. DRRs document the data collection methods and quality standards used to prepare and review data prior to release. DRRs present the quality of resultant data in the context of fitness for their intended use.

Each DRR cites source and resultant datasets that are published concurrently and cross-referenced. Associated datasets are made publicly available with the exception of data that must be protected from release as per NPS and park-specific policies.

Data packages that are published concurrently with DRRs are intended to be independently citable scientific works that can serve as the basis for subsequent analysis and reporting by NPS or third parties.

Project Set-up

The Template is not a stand-alone file but instead has multiple associated and dependent files. New projects can be established using by downloading the Zip file associated with the latest release of the DRR template repository on GitHub.

Folder Structure

General directory contents are as follows (Figure 1):

Standard project directory structure for Data Release Reports

Figure 1: Standard project directory structure for Data Release Reports

Reproducible Reports

The following is for users who are using the DRR_to_docx.rmd template file to generate a data release report using RMarkdown.

Standard Code Chunks

In addition to the report outline and a description of content for each section, the template includes four standard code chunks.

YAML Header:

The YAML header helps format the DRR. You should not need to edit any of the YAML header.

R code chunks:

Citations

Automating Citations

To automate citations, add the citation in bibtex format to the file “references.bib”. You can manually copy and paste the bibtex for each reference in, or you can search for it from within Rstudio. From within Rstudio, make sure you are editing this document using the “Visual” view (as opposed to “Source”). From the “Insert” drop-down menu, select “@ Citation…” (shortcut: Cntrl-Shift-F8). This will open a Graphical User Interface (GUI) tool where you can view all the citations in your reference.bib file as well as search multiple databases for references, automatically insert the bibtex for the reference into your references.bib file (and customize the unique identifier if you’d like) and insert the in-text citation into the DRR template.

Adding Citations - Source vs. Visual editing of the Template and how to access the citation manager.

Figure 2: Adding Citations - Source vs. Visual editing of the Template and how to access the citation manager.

Adding Citations - Using the citation manager.

Figure 3: Adding Citations - Using the citation manager.

Once a reference is in your references.bib file, using the Visual view of the template you can simply type the ‘@’ symbol and select which reference to insert in the text.

If you need to edit how the citation is displayed after inserting it into the text, switch back to the “Source” view. Each bibtex citation should start with a unique identifier; the example reference in the supplied references.bib file has the unique identifier “@article{Scott1994,”. Using the “Source” view in Rstudio, insert the reference in your text, by combining the “at” symbol with the portion of the unique identifier after the curly bracket: @Scott1994 .

Syntax Result
@Scott1994 concludes that … Scott et al., 1994 concludes that …
@Scott1994[p.33] concludes that … Scott (1994, p.33) concludes that …
… end of sentence [@Scott1994]. … end of sentence (Scott et al., 1994).
… end of sentence [see @Scott1994,p.33]. … end of sentence (see Scott et al. 1994,p.33).
delineate multiple authors with colon: [@Scott1994; @aberdeen1958] delineate multiple authors with colon: (Scott et al., 1994; Aberdeen, 1958)
Scott et al. conclude that …. [-@Scott1994] Scott et al. conclude that . . . (1994)

The full citation, properly formatted, will be included in a “References” section at the end of the rendered MS Word document. . . though it is also worth visually inspecting the .docx for citation completeness and formatting.

Manual citations

If you would like to format your citations manually, please feel free to do so. Make sure to look at the References section for how to properly format each citation type.

Editing the Text

The following text in the body of the DRR template will need to be edited to customize it to each data package.

Data Records

This is a required section and consists of two subheadings:

Sample text is included that uses r code to incorporate previously specified parameters such as the data package title, file names, and DOI.

A code for a sample table summarizing the contents of the data package (except the metadata) is provided.

Data Quality

This is a required section. and the text includes multiple suggested text elements and code for an example table defining data flagging codes. Near future development here will incorporate additional optional tables to summarize the data quality based on the flags in the data sets.

Usage Notes

This is a required section that should contain brief instructions to assist other researchers with reuse of the data. This may include discussion of software packages (with appropriate citations) that are suitable for analysing the assay data files, suggested downstream processing steps (e.g. normalization, etc.), or tips for integrating or comparing the data records with other datasets. Authors are encouraged to provide code, programs or data-processing workflows if they may help others understand or use the data.

Methods

This is a required section that cites previous methods used but should also be detailed enough in describing data production including the experimental design, data acquisition assays, and any computational processing (e.g. normalization, QA, QC) such that others can understand the methods without referring to associated publications.

Optional sub-sections within the methods include:

References

This required section includes full bibliographic references for each paper, chapter, book, data package, dataset, protocol, etc cited within the DRR.

There are numerous examples of proper formatting for each of these. Future versions of the DRR will enable automatic reference formatting given a correctly formatted bibtex file with the references (.bib).

Figures

Figures should be inserted using code chunks in all cases so that figure settings can be set in the chunk header. The chunk header should at a minimum set the fig.align parameter to “center” and the specify the figure caption (fig.cap parameter). Inserting figures this way will ensure that the caption is properly formatted and it will apply copy the caption to the figure’s “alt text” tag, making it 508-compliant.

For example:

```{r fig2, echo=FALSE, out.width="70%", fig.align="center", fig.cap="Example general workflow to incude in the methods section."} 
knitr::include_graphics("ProcessingWorkflow.png")
```

Results in:

Example general workflow to incude in the methods section.

Figure 4: Example general workflow to incude in the methods section.

Tables

Tables should be created using the kable function. Specifying the caption in the kable function call (as opposed to inline markdown text) will ensure that the caption is appropriately formatted.

For example:

```{r Table2, echo=FALSE}
c1<-c("Protocol1","Protocol2","Protocol3")
c2<-c("Park Unit 1","Park Unit 2","Park Unit 3")
c3<-c("Site 1","Site 2","Site 3")
c4<-c("Date 1","Date 2","Date 3")
c5<-c("GEOXXXXX","GEOXXXXX","GEOXXXXX")
Table2<-data.frame(c1,c2,c3,c4,c5)

kable(Table2, 
      col.names=c("Subjects","Park Units","Locations","Sampling Dates","Data"),
      caption="**Table 1.** Monitoring study example Data Records table.") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),full_width=F)
```

Results in:

Table 1: Table 1. Monitoring study example Data Records table.
Subjects Park Units Locations Sampling Dates Data
Protocol1 Park Unit 1 Site 1 Date 1 GEOXXXXX
Protocol2 Park Unit 2 Site 2 Date 2 GEOXXXXX
Protocol3 Park Unit 3 Site 3 Date 3 GEOXXXXX

Publishing DRRs

Report Numbers

Because data release reports and associated data packages are cross-referential, report numbers are typically assigned early in data processing and quality evaluation.

DRR DOIs have the format: https://doi.org/10.36967/xxxxxxx

Data package DOIs have the format: https://doi.org/10.57830/xxxxxxx

Where the “xxxxxx” is the 7-digit DataStore reference number.

Liability Statements

Under no circumstances should reports and associated data packages or metadata published in the DRR series contain disclaimers or text that suggests that the work does not meet scientific integrity or information quality standards of the National Park Service. The following disclaimers are suitable for use, depending on whether the data are provisional or final (or approved or certified).

For approved & published data sets: “Unless otherwise stated, all data, metadata and related materials are considered to satisfy the quality standards relative to the purpose for which the data were collected. Although these data and associated metadata have been reviewed for accuracy and completeness and approved for release by the National Park Service Inventory and Monitoring Division, no warranty expressed or implied is made regarding the display or utility of the data for other purposes, nor on all computer systems, nor shall the act of distribution constitute any such warranty.”

For provisional data: “The data you have secured from the National Park Service (NPS) database identified as [database name] have not received approval for release by the NPS Inventory and Monitoring Division, and as such are provisional and subject to revision. The data are released on the condition that neither the NPS nor the U.S. Government shall be held liable for any damages resulting from its authorized or unauthorized use.”