Skip to contents

Dependencies

In addition to EMLeditor, you will also need the EML package to complete these steps.

You can download and install them individually, or get everything you need at once from the NPSdataverse using:

#install the NPSdataverse:
install.packages("devtools")
devtools::install_github("nationalparkservice/NPSdataverse")
library(NPSdataverse)
#individual install:
install.packages("devtools")
devtools::install_github("nationalparkservice/EMLeditor")
install.packages("EML")
library(EML)
library(EMLeditor)

Workflow Outline

EMLeditor’s primary objective is to edit and view EML formatted files, not to generate them from scratch. A suggested workflow is:

  1. Use the EMLassemblyline::make_eml() to generate an initial EML document and save it as a .xml file (NPS naming convention is: *_metadata.xml)
  2. Use the EML::read_eml() function to load your EML file into R as an R object.
  3. Use EMLeditor functions to edit the metadata in R and evaluate whether your metadata is acceptable (don’t forget to use EML::eml_validate() to make sure you are generating valid EML).
  4. Use the EML::write_eml() function to write the R object back to XML (remember the NPS naming convention for metadata files is *_metadata.xml).

If you use EMLeditor functions to alter your metadata (e.g. any function with the prefix “set_” in the name) they will also silently add the National Park Service as a publisher (including location, ROR id, etc) to your metadata unless you set NPS=FALSE. If you leave the default setting as NPS=TRUE, EMLeditor will also assume the data package is being created “by or for the NPS” and add that information to the metadata. For more details on customizing the publisher and originating agency content, see the section for non-NPS users.

“set_” functions will also inject information about which version of EMLeditor you used into the metadata.

A Minimal Workflow

This workflow assumes your EML was generated using EMLassemblyline, but will also work with any number of other EML generators (e.g. ezEML).

Currently, many of the steps in EMLeditor are by default interactive and will give you feedback on the current fields in your metadata, ask if you want to update them, and report on the results of your edits. If you would like to turn this option off, set force = TRUE and see the section on automated scripting with EMLeditor.

Import your metadata into R

my_metadata <- EML::read_eml("mymetadata_metdata.xml", from="xml")

Add information about CUI

Add information about your Controlled Unclassified Information (CUI). Because it is important to indicate not only what type of CUI there is, but when (and why) there is not CUI, you must do this even if your data package does not contain CUI. Choose from one of five CUI codes. These are:

  • PUBLIC - Does NOT contain CUI.
  • FED ONLY - Contains CUI. Only federal employees should have access (similar to “internal only” in DataStore).
  • FEDCON - Contains CUI. Only federal employees and federal contractors should have access (also very much like current “internal only” setting in DataStore)
  • DL ONLY - Contains CUI. Should only be available to a named list of individuals (where and how to list those individuals TBD)
  • NOCON - Contains CUI. Federal, state, local, or tribal employees may have access, but contractors cannot.

The first code is NPS specific. More information about the remaining four codes can be found on the National Archives website.

my_meta2 <- set_cui(my_metadata, "PUBLIC")

Set the intellectual rights

EMLassemblyine and ezEML provide some attractive looking boilerplate for setting the intellectual rights. It looks reasonable and so is easy to just keep. However, NPS has some specific regulations about what can and cannot be in the intellectualRights tag. Use set_int_rights() to replace the text with NPS-approved text. Note: You must first add the CUI dissemination code using set_cui() as the dissemination code and license must agree. That is, you cannot give a data package with a PUBLIC dissemination code a “restricted” license (and vise versa: a restricted data package that contains CUI cannot have a public domain or CC0 license).

  1. “restricted”: If the data contains Controlled Unclassified Information (CUI), the intellectual rights must read:

“This product has been determined to contain Controlled Unclassified Information (CUI) by the National Park Service, and is intended for internal use only. It is not published under an open license. Unauthorized access, use, and distribution are prohibited.”

  1. “public”: If the data do not contain CUI, the default is the public domain. The intellectual rights must read:

“This work is in the public domain. There is no copyright or license.”

  1. “CC0”: If you need a license, for instance if you are working with a partner organization that requires a license, use CC0:

“The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.”

The set_int_rights() function will also put the name of your license in a field in EML for DataStore harvesting.

# choose from "restricted", "public" and "CC0", see above:
my_meta2 <- set_int_rights(my_meta2, "public")

Add a data package DOI

Add your data package’s Digital Object Identifier (DOI) to the metadata. The set_datastore_doi() function requires that you are logged on to the VPN. It initiates a draft data package reference on DataStore, and populates the reference with a title pulled from your metadata, “[DRAFT] : <your data package title>”. This temporary title is purely for your tracking purposes and can easily be updated later. The set_datastore_doi() function will then insert the DOI for your data package into your metadata. There are a few things to keep in mind:

  • Your DOI and the data package reference are not yet active and are not publicly accessible until after review and activation/publication.
  • Be sure to upload your data package to the correct draft reference! It is easy to create several draft references with the same draft title so check the reference ID number carefully (We are working on making this process easier and less error prone).
my_meta2 <- set_datastore_doi(my_meta2)

Add information about the DRR (optional)

If you are producing (or plan to produce) a DRR, add links to the DRR describing the data package.

Similar to when you added the data package DOI, you will need the DOI for the DRR you are drafting as well as the DRR’s Title. Again, go to DataStore and initiate a draft DRR, including a title. For the purposes of the data package, there is no need to populate any other fields. At this point, you do not need to activate the DRR reference and, while a DOI has been reserved for your DRR, it will not be activated until after publication so that you have plenty of time to construct the DRR.

my_meta2 <- set_drr(my_meta2, 7654321, "DRR Title")

Set the language

This is the human language (as opposed to computer language) that the data package and metadata are constructed in. Examples include English, Spanish, Navajo, etc. A full list of available languages is available from the Library of Congress. Please use the “English Name of Language” as an input. The function will then convert your input to the appropriate 3-character ISO 639-2 code.

my_meta2 <- set_language(my_meta2, "English")

These are the park units where data were collected from, for instance ROMO, not ROMN. If the data package includes data from more than one park, they can all be listed. For instance, if data were collected from all park units within a network, each unit should be listed separately rather than the network. This is because the geographic coordinates corresponding to bounding boxes for each park unit listed will automatically be generated and inserted into the metadata. Individual park units will be more informative than the bounding box for the entire network.

park_units <- c("ROMO", "GRSD", "YELL")
my_meta2 <- set_content_units(my_meta2, park_units)

Add the producing unit(s)

This is the unit responsible for generating the data package. It may be a single park (ROMO) or a network (ROMN). It may be identical to the units listed in the previous step, overlapping, or entirely different.

#a single proucing unit:
my_meta2 <- set_producing_units(my_meta2, "ROMN")

#for collaborative projects with multiple producing units:
my_meta2 <- set_producing_units(my_meta2, c("ROMN", "GRYN")

Great! You’re done adding the essential NPS-specific metadata to your EML. There are only two quick steps left:

Validate your EML

OK, this first one might take a tick to run:

EML::eml_validate(my_meta2)

If eml_validate returns errors, inspect them and fix them. Feel free to contribute an issue, or email Rob Baker with questions, concerns, or suggestions.

Write your edited EML back to disk

Assuming everything went smoothly and eml_validate returns ‘TRUE’, write your EML back to your disk so you can upload it with your data files to DataStore. Keep in mind that the file name should end in _metadata.xml. Also, when uploading your data package (data files and metadata) to DataStore, make sure to upload it to the correct draft reference!

EML::write_eml(my_meta2, "mymetadatafilename_metadata.xml")

Additional Functions

The Minimal Workflow section assumes that you have correctly used EMLassemblyline to generate a high-quality EML document. In the event that you find issues with your EML or wish to correct portions of it, EMLeditor includes some functions that allow you to edit common EML errors without having to re-run EMLassemblyline.

Edit the title

If your title has changed (for instance, perhaps reviewers have suggested a title that you realize you prefer) or you find a typo in your title, you can update your title directly in EMLeditor using set_title():

#give my data package a new title:

my_meta2 <- set_title(my_meta2, "This is my new Title")

#check the new title:
get_title(my_meta2)

Edit the abstract

Because your abstract will be prominently displayed on the DataStore landing page and will be forwarded to DataCite for DOI assignment and data.gov (among other places) to enhance data discoverability and reuse, it is important that your abstract not contain errors. Typographical errors, particularly non-ascii characters are common problems in the abstract of EML documents. The set_abstract() function includes a number of routines to minimize errors introduced by word processors or non-UTF8 encoding (we are pretty sure you don’t want that “&amp13;” in your abstract) but it cannot anticipate all potential eventualities. You are therefore encouraged to construct your abstract in Notepad or other text editor (NOT a word processor such as Microsoft Word). This is a relatively simple function and does not readily support multiple paragraphs, bullet points, or the like.

#replace your abstract:
my_meta2 <- set_abstract(my_meta2, "This is my new abstract. I can use this function to replace it as many times as I like until it looks just the way I want it to.")

#check the new abstract:
get_abstract(my_meta2)

Check back here for more handy functions in the future. If there’s something you’d like to see added, please let us know by posting an Issue on github.

Don’t forget to validate your updated EML file and to write it back to .xml after using EMLeditor to make edits.

Scripting with EMLeditor

The interactive feedback and prompts provided by EMLeditor functions can be turned off to enable efficient scripting. All “set_” class functions have a parameter, force that defaults to force = FALSE. To turn off the feedback and prompts, set force = TRUE when calling each function. Be careful using the functions in this way as they may - or may not - make changes to your metadata and you will not be advised of any change or lack of change. Inspect your final product carefully.

Custom Publisher/Producer

EMLeditor functions are designed primarily for use by staff at the National Park Service for publication of data packages to DataStore. Consequently, all “set_” class functions silently perform two operations by default:

  1. They set the publisher to the National Park Service (and the location to the Fort Collins office)
  2. They specify the agency that created the data package as NPS and set a field “by or for NPS” to TRUE

You can prevent set_class functions from performing these operations by changing the default status of the parameter NPS = TRUE to NPS = FALSE. This will leave your publisher information untouched and will not create an additionalMetadata item for the agency that created the data package.

If you would like to set the publisher to something other than the Fort Collins Office of the National Park Service or your would like to set the agency that created the data package to something other than NPS, use the set_publisher() function. Be sure to specify NPS = FALSE or the function will perform the default operations (set publisher to NPS at the Fort Collins Office and set the agency to NPS).

Warning: set_publisher should only be used in a few, likely rare, circumstances:

  1. If the publisher Is NOT the National Park Service
  2. If the contact address for the publisher is NOT the central office in Fort Collins (all data packages uploaded to DataStore will be published by the Fort Collins Office of NPS)
  3. If the originating agency is NOT the NPS (i.e. a contractor or partner organization)
  4. If the data package is NOT created for or by the NPS

It’s probably a good idea to run

args(set_publisher)

to make sure you have all the arguments, especially those with defaults, properly specified.

Check your EML

It’s always a good idea to check your EML. Other than visually inspecting the .xml, file three good approaches are:

  1. Check whether your EML is schema-valid:
# Use the eml_validate function from the EML package:
EML::eml_validate(my_meta2)
  1. Build a mock-up of a readme file:
# Outputs readme to the screen
write_readme(my_meta2) 

# Alternatively, save the readme to a text file
write_readme(my_meta2, "readme.txt") 

The mock readme file is an approximation of the readme file that will automatically be generated by DataStore upon upload and included in your data package. It’s a good, human readable, way to check whether many critical elements of your EML are properly formatted. Although the actual readme on DataStore may differ slightly, if the mock readme looks good, that’s a good indication that the readme on DataStore will too. On the other hand, if there is something off in the mock readme, that’s a good indication that you may want to go back and fix the relevant portions of your EML.

do not upload your mock readme file to DataStore!

  1. Run a series of NPS-specific checks on your metadata. These are the same checks that reviewers will likely run prior to publication. They are metadata-specific subset of the functions included in the DPchecker package.
#If you have not already written your metadata back to .xml:
EML::write_eml(mymeta2, "my_metadata.xml")

#run checks on your metadata. You must tell the check_eml() function where your file is. It will default to the current working directory. There must be only one xml file in the directory.

#If your metadata is in the current working directory:
check_eml()

#to change to a new sub-directory:
check_eml(directory=here::here("my_new_directory"))

#to change to a directory higher up in the hierarchy:
check_eml(directory=here::here(".."))

#to move higher up in the hierarchy, then select a different subdirectory:
check_eml(directory=here::here("..", "a_different_sub_directory"))