NPS Data Store Utilities
This package is a collection of functions to acquire metadata and data from the National Park Service Data Store. Please request enhancements and bug fixes through Issues.
Install from GitHub
# install.packages("pak")
pak::pkg_install("nationalparkservice/NPSutils")
NPSutils is also available as part of the NPSdataverse
# install.packages("pak")
pak::pkg_install("nationalparkservice/NPSdataverse")
Using NPSutils
What is a data package?
A data package on the National Park Service’s science repository, DataStore consists of one or more tabular data files in .csv file format and a single machine-readable metadata file in Ecological Metadata Format (EML) with a filename ending in “_metadata.xml”
Downloading data packages
To download a data package (or multiple data packages) you will need to supply the 7-digit reference ID for the data package. You can find this on the DataStore web interface. On the landing page for the data package, the reference ID is listed under “Reference Type data package (Code: xxxxxxx)”.
You can also use the NPSutils function get_ref_list
to
retrieve a list of (all) data packages.
Using the default settings for get_data_package
will
create a sub-directory within your working directory called “data”. Each
data package will be downloaded to a separate direct within “data”
corresponding to the package reference ID (e.g. ../data/xxxxxxx).
# downloads the Mojave Desert Network Desert Springs data package
# This package can be found at: https://doi.org/10.57830/2300498
get_data_package(2300498)
If you would like to access restricted data (and you have the
appropriate permissions) you can set the parameter
secure = TRUE
. Please note that while NPS strives to make
all data publicly accessible according to the NPS Public
Access Plan. Some data are considered confidential unclassified
information (e.g. location of vulnerable or exploitable species). Park
superintendents make the final decision on which data are considered
confidential and how they should be handled (redacted, fuzzed locations,
etc).
Please be aware that if you download restricted data, you may not share it outside of the authorized persons or groups specified in the DataStore reference unless you have explicit permission to do so (preferably in writing).
# download a restricted data package:
get_data_package(2305242, secure = TRUE)
Loading data packages into R
To work with a data package, you may want to load it into R using
load_data_package
. If you used the default settings for
get_data_package
to download the data package, you can use
the default settings in load_data_package
to access the
data. The default settings for load_data_package
will use
R’s built-in algorithms to determine whether variables are characters,
factorals, integers, dates, ets:
# load a data package into R:
MOJN <- load_data_package(2300498)
Alternatively, you can use the information provided in the data package metadata to specifically set the attribute definitions when loading the data:
# load a data package; use metadata to assign attribute types:
MOJN <- load_data_package(2300498, assign_attributes = TRUE)
Note R functions can only return a single object. Therefore, the object returned is a list of tibbles where each data file (.csv) is a separate item within the list. You can use the following code to split that tibble into separate dataframes:
# split a list of tibbles into individual dataframes
list2env(MOJN, envir = .GlobalEnv)
Loading metadata
You may want to access the metadata directly in it’s entirety, which
you can do using load_pkg_metadata()
. If you used the
default directory settings in get_data_package
, you can use
the default settings in load_pkg_metadata
as well. This
will return a very complicated list that consists of everything in the
*_metadata.xml file within the data package:
metadata <- load_pkg_metadata(2300498)
You might want to have a subset of the metadata available in a
dataframe if, for instance, you wanted to import it into something like
Power BI. You can get the metadata necessary for citations and
background infomation (everything on the “Core” tab on the DataStore
reference landing page) using the following code. Again, if you’ve used
the default directory settings for get_data_package
you can
use the default settings for load_core_metadata
.
core_metadata <- load_core_metadata(2300498)
Removing local data packages
When you are done working with a data package, you may want to free
up hard drive space. You can delete your local copy of a data package
using rm_local_packages
. Again, if you used the default
directory structure set up by get_data_package
you can also
use the default settings for rm_local_packages
:
rm_local_packages(2300498)
Advanced scripting options
If you are setting up a scripted solution for your workflow, you may
want to turn off the interactive features for functions within NPSutils.
Each NPSutils function has a parameter force
. Setting
force = TRUE
will turn off the interactive components of
the function. You should be careful setting force = TRUE
as
this turns off a lot of the protections and suggestions within functions
(for instance, you will not be warned if you are attempting to download
something that is not a data package; you will not be warned that there
is a more recent version of the data package you are requesting
available, etc).