Dependencies
In addition to EMLeditor, you will also need the EML package to complete these steps.
You can download and install them individually, or get everything you need at once from the NPSdataverse using:
#install the NPSdataverse:
install.packages("devtools")
devtools::install_github("nationalparkservice/NPSdataverse")
library(NPSdataverse)
#individual install:
install.packages("devtools")
devtools::install_github("nationalparkservice/EMLeditor")
install.packages("EML")
library(EML)
library(EMLeditor)
Workflow Outline
EMLeditor’s primary objective is to edit and view EML formatted files, not to generate them from scratch. A suggested workflow is:
- Use the EMLassemblyline::make_eml() to generate an initial EML document and save it as a .xml file (NPS naming convention is: *_metadata.xml)
- Use the EML::read_eml() function to load your EML file into R as an R object.
- Use EMLeditor functions to edit the metadata in R and evaluate whether your metadata is acceptable (don’t forget to use EML::eml_validate() to make sure you are generating valid EML).
- Use the EML::write_eml() function to write the R object back to XML (remember the NPS naming convention for metadata files is *_metadata.xml).
If you use EMLeditor functions to alter your metadata (e.g. any
function with the prefix “set_” in the name) they will also silently add
the National Park Service as a publisher (including location, ROR id, etc) to your metadata unless you set
NPS=FALSE
. If you leave the default setting as
NPS=TRUE
, EMLeditor will also assume the data package is
being created “by or for the NPS” and add that information to the
metadata. For more details on customizing the publisher and originating
agency content, see the section
for non-NPS users.
“set_” functions will also inject information about which version of EMLeditor you used into the metadata.
A Minimal Workflow
This workflow assumes your EML was generated using EMLassemblyline, but will also work with any number of other EML generators (e.g. ezEML).
Currently, many of the steps in EMLeditor are by default interactive
and will give you feedback on the current fields in your metadata, ask
if you want to update them, and report on the results of your edits. If
you would like to turn this option off, set force = TRUE
and see the section on automated
scripting with EMLeditor.
Import your metadata into R
my_metadata <- EML::read_eml("mymetadata_metdata.xml", from="xml")
Add information about CUI
Add information about your Controlled Unclassified Information (CUI). Because it is important to indicate not only what type of CUI there is, but when (and why) there is not CUI, you must do this even if your data package does not contain CUI. Choose from one of five CUI codes. These are:
- PUBLIC - Does NOT contain CUI.
- FED ONLY - Contains CUI. Only federal employees should have access (similar to “internal only” in DataStore).
- FEDCON - Contains CUI. Only federal employees and federal contractors should have access (also very much like current “internal only” setting in DataStore)
- DL ONLY - Contains CUI. Should only be available to a named list of individuals (where and how to list those individuals TBD)
- NOCON - Contains CUI. Federal, state, local, or tribal employees may have access, but contractors cannot.
The first code is NPS specific. More information about the remaining four codes can be found on the National Archives website.
my_meta2 <- set_cui(my_metadata, "PUBLIC")
Set the intellectual rights
EMLassemblyine and ezEML provide some attractive looking boilerplate
for setting the intellectual rights. It looks reasonable and so is easy
to just keep. However, NPS has some specific regulations about what can
and cannot be in the intellectualRights tag. Use
set_int_rights()
to replace the text with NPS-approved
text. Note: You must first add the CUI dissemination code using
set_cui()
as the dissemination code and license must agree.
That is, you cannot give a data package with a PUBLIC dissemination code
a “restricted” license (and vise versa: a restricted data package that
contains CUI cannot have a public domain or CC0 license).
- “restricted”: If the data contains Controlled Unclassified Information (CUI), the intellectual rights must read:
“This product has been determined to contain Controlled Unclassified Information (CUI) by the National Park Service, and is intended for internal use only. It is not published under an open license. Unauthorized access, use, and distribution are prohibited.”
- “public”: If the data do not contain CUI, the default is the public domain. The intellectual rights must read:
“This work is in the public domain. There is no copyright or license.”
- “CC0”: If you need a license, for instance if you are working with a partner organization that requires a license, use CC0:
“The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.”
The set_int_rights()
function will also put the name of
your license in a
# choose from "restricted", "public" and "CC0", see above:
my_meta2 <- set_int_rights(my_meta2, "public")
Add a data package DOI
Add your data package’s Digital Object Identifier (DOI) to the
metadata. The set_datastore_doi()
function requires that
you are logged on to the VPN. It initiates a draft data package
reference on DataStore, and populates the reference with a title pulled
from your metadata, “[DRAFT] : <your data package title>”. This
temporary title is purely for your tracking purposes and can easily be
updated later. The set_datastore_doi()
function will then
insert the DOI for your data package into your metadata. There are a few
things to keep in mind:
- Your DOI and the data package reference are not yet active and are not publicly accessible until after review and activation/publication.
- Be sure to upload your data package to the correct draft reference! It is easy to create several draft references with the same draft title so check the reference ID number carefully (We are working on making this process easier and less error prone).
my_meta2 <- set_datastore_doi(my_meta2)
Add information about the DRR (optional)
If you are producing (or plan to produce) a DRR, add links to the DRR describing the data package.
Similar to when you added the data package DOI, you will need the DOI for the DRR you are drafting as well as the DRR’s Title. Again, go to DataStore and initiate a draft DRR, including a title. For the purposes of the data package, there is no need to populate any other fields. At this point, you do not need to activate the DRR reference and, while a DOI has been reserved for your DRR, it will not be activated until after publication so that you have plenty of time to construct the DRR.
my_meta2 <- set_drr(my_meta2, 7654321, "DRR Title")
Set the language
This is the human language (as opposed to computer language) that the data package and metadata are constructed in. Examples include English, Spanish, Navajo, etc. A full list of available languages is available from the Library of Congress. Please use the “English Name of Language” as an input. The function will then convert your input to the appropriate 3-character ISO 639-2 code.
my_meta2 <- set_language(my_meta2, "English")
Add content unit links
These are the park units where data were collected from, for instance ROMO, not ROMN. If the data package includes data from more than one park, they can all be listed. For instance, if data were collected from all park units within a network, each unit should be listed separately rather than the network. This is because the geographic coordinates corresponding to bounding boxes for each park unit listed will automatically be generated and inserted into the metadata. Individual park units will be more informative than the bounding box for the entire network.
park_units <- c("ROMO", "GRSD", "YELL")
my_meta2 <- set_content_units(my_meta2, park_units)
Add the producing unit(s)
This is the unit responsible for generating the data package. It may be a single park (ROMO) or a network (ROMN). It may be identical to the units listed in the previous step, overlapping, or entirely different.
#a single proucing unit:
<- set_producing_units(my_meta2, "ROMN")
my_meta2
#for collaborative projects with multiple producing units:
<- set_producing_units(my_meta2, c("ROMN", "GRYN") my_meta2
Great! You’re done adding the essential NPS-specific metadata to your EML. There are only two quick steps left:
Validate your EML
OK, this first one might take a tick to run:
EML::eml_validate(my_meta2)
If eml_validate returns errors, inspect them and fix them. Feel free to contribute an issue, or email Rob Baker with questions, concerns, or suggestions.
Write your edited EML back to disk
Assuming everything went smoothly and eml_validate returns ‘TRUE’, write your EML back to your disk so you can upload it with your data files to DataStore. Keep in mind that the file name should end in _metadata.xml. Also, when uploading your data package (data files and metadata) to DataStore, make sure to upload it to the correct draft reference!
EML::write_eml(my_meta2, "mymetadatafilename_metadata.xml")
Additional Functions
The Minimal Workflow section assumes that you have correctly used EMLassemblyline to generate a high-quality EML document. In the event that you find issues with your EML or wish to correct portions of it, EMLeditor includes some functions that allow you to edit common EML errors without having to re-run EMLassemblyline.
Edit the title
If your title has changed (for instance, perhaps reviewers have
suggested a title that you realize you prefer) or you find a typo in
your title, you can update your title directly in EMLeditor using
set_title()
:
Edit the abstract
Because your abstract will be prominently displayed on the DataStore
landing page and will be forwarded to DataCite for DOI assignment and
data.gov (among other places) to enhance data discoverability and reuse,
it is important that your abstract not contain errors. Typographical
errors, particularly non-ascii characters are common problems in the
abstract of EML documents. The set_abstract()
function
includes a number of routines to minimize errors introduced by word
processors or non-UTF8 encoding (we are pretty sure you don’t want that
“&13;” in your abstract) but it cannot anticipate all potential
eventualities. You are therefore encouraged to construct your abstract
in Notepad or other text editor (NOT a word processor
such as Microsoft Word). This is a relatively simple function and does
not readily support multiple paragraphs, bullet points, or the like.
#replace your abstract:
my_meta2 <- set_abstract(my_meta2, "This is my new abstract. I can use this function to replace it as many times as I like until it looks just the way I want it to.")
#check the new abstract:
get_abstract(my_meta2)
Check back here for more handy functions in the future. If there’s something you’d like to see added, please let us know by posting an Issue on github.
Don’t forget to validate your updated EML file and to write it back to .xml after using EMLeditor to make edits.
Scripting with EMLeditor
The interactive feedback and prompts provided by EMLeditor functions
can be turned off to enable efficient scripting. All “set_” class
functions have a parameter, force
that defaults to
force = FALSE
. To turn off the feedback and prompts, set
force = TRUE
when calling each function. Be careful using
the functions in this way as they may - or may not - make changes to
your metadata and you will not be advised of any change or lack of
change. Inspect your final product carefully.
Custom Publisher/Producer
EMLeditor functions are designed primarily for use by staff at the National Park Service for publication of data packages to DataStore. Consequently, all “set_” class functions silently perform two operations by default:
- They set the publisher to the National Park Service (and the location to the Fort Collins office)
- They specify the agency that created the data package as NPS and set a field “by or for NPS” to TRUE
You can prevent set_class functions from performing these operations
by changing the default status of the parameter NPS = TRUE
to NPS = FALSE
. This will leave your publisher information
untouched and will not create an additionalMetadata item for the agency
that created the data package.
If you would like to set the publisher to something other than the
Fort Collins Office of the National Park Service or your would like to
set the agency that created the data package to something other than
NPS, use the set_publisher()
function. Be sure to specify
NPS = FALSE
or the function will perform the default
operations (set publisher to NPS at the Fort Collins Office and set the
agency to NPS).
Warning: set_publisher should only be used in a few, likely rare, circumstances:
- If the publisher Is NOT the National Park Service
- If the contact address for the publisher is NOT the central office in Fort Collins (all data packages uploaded to DataStore will be published by the Fort Collins Office of NPS)
- If the originating agency is NOT the NPS (i.e. a contractor or partner organization)
- If the data package is NOT created for or by the NPS
It’s probably a good idea to run
args(set_publisher)
to make sure you have all the arguments, especially those with defaults, properly specified.
Check your EML
It’s always a good idea to check your EML. Other than visually inspecting the .xml, file three good approaches are:
- Check whether your EML is schema-valid:
# Use the eml_validate function from the EML package:
EML::eml_validate(my_meta2)
- Build a mock-up of a readme file:
# Outputs readme to the screen
write_readme(my_meta2)
# Alternatively, save the readme to a text file
write_readme(my_meta2, "readme.txt")
The mock readme file is an approximation of the readme file that will automatically be generated by DataStore upon upload and included in your data package. It’s a good, human readable, way to check whether many critical elements of your EML are properly formatted. Although the actual readme on DataStore may differ slightly, if the mock readme looks good, that’s a good indication that the readme on DataStore will too. On the other hand, if there is something off in the mock readme, that’s a good indication that you may want to go back and fix the relevant portions of your EML.
do not upload your mock readme file to DataStore!
- Run a series of NPS-specific checks on your metadata. These are the same checks that reviewers will likely run prior to publication. They are metadata-specific subset of the functions included in the DPchecker package.
#If you have not already written your metadata back to .xml:
EML::write_eml(mymeta2, "my_metadata.xml")
#run checks on your metadata. You must tell the check_eml() function where your file is. It will default to the current working directory. There must be only one xml file in the directory.
#If your metadata is in the current working directory:
check_eml()
#to change to a new sub-directory:
check_eml(directory=here::here("my_new_directory"))
#to change to a directory higher up in the hierarchy:
check_eml(directory=here::here(".."))
#to move higher up in the hierarchy, then select a different subdirectory:
check_eml(directory=here::here("..", "a_different_sub_directory"))