template_geographic_coverage()
now accepts numeric values to thesite.col
parameter (#121).- EMLassemblyline now does a better job of distinguishing metadata template files from other files in the same directory.
- Now imports taxonomyCleanr >= 1.6.2. The EDIutils package was still listed for taxonomyCleanr 1.6.1 (#113).
- Now imports taxonomyCleanr >= 1.6.1 so the EDIutils package is not required (#112).
- To identify EML files created with EMLassemblyline, the application and version number are now included in the additionalMetadata element (#109).
- Update the get_eol function - The get_eol() function was refactored for efficiency and clarity in the hymetDP package. The refactored function, unit tests, and updated implementation have been ported here.
- Update provenance.txt documentation - Added some minor details to the provenance.txt documentation.
- Fix keyword resolver - Keywords were not resolving to the LTER vocabulary due to a regression that occurred during the EDIutils deprecation. The missing functions have been added back to EAL.
- Removed version requirements from dependencies. Removed unused EDIutils from list of dependencies.
- Integrated some EDIutils functions and removed the package dependency
- A resolved issue in the taxonomyCleanr dependency requires a minimum of version 1.5.4.
- Implement support for markdown and LaTex equations embedded in markdown. This only works for methods.md templates. NOTE: LaTex equations must be wrapped in "$$", otherwise they won't be parsed correctly. This addresses issue #85.
- Fix non-resolvable schemaLocation.
- Fix pandoc xml to .txt, .md, and, .docx translation issue.
- A resolved issue in the taxonomyCleanr dependency requires a minimum of version 1.5.3.
- Defer to data.table's handling of quote characters during read operations.
- Remove an existing validation issues object from the global environment each time a validation suite is run, otherwise resolved issues may appear new.
- Remove validation checks where not critical to a functions operation.
- Remove title case coercion from personnel roles.
- Relax constraints on data table read operation.
- Schema validation: Recent changes to
emld
complicate validation of EML files input to some EAL functions. Validation of such inputs have been suspended until a permanent fix can be made. These changes have no impact on EAL working correctly, unless an invalid EML record is input, in which case the function will fail if there is a critical issue.
- template_table_attributes(): A minor bug in the column classification logic has been fixed.
- Updated dependency: Delimiter detection was underperforming for MacOS. A new version of
EDIutils
fixes this.
- tab field delimiters: For tab delimited data tables, an empty white space was being written to the EML rather than the expected \t. This fix addresses issue #84.
- eml2eal(): For when you want to work with EML in EAL but don't have the templates and make_eml() function call. This addresses issue #76
- Remove last column name constraint: Column names can now be composed of any character strings handled by
data.table::fread()
. - Remove bookend quote characters mistakenly added to values: Some file writers will place quotes around a value when it contains commas. This safeguard identifies and removes these.
- Outdated dependencies: Some listed dependencies were outdated resulting in warnings/errors. This has been fixed.
- template_categorical_variables(): "" numeric categorical variables were being returned in the template as "NA". Now they are correctly returned as "".
- Bug fix: table & attrs mismatch if unequal lengths: Table B would not match attrs B in this case.
- template_taxonomic_coverage(): Resolving to ITIS began failing due to a change in title (from "ITIS" to "Integrated Taxonomic Information SystemITIS" listed in the return from
taxize::gnr_datasources()
. This issue has been fixed in thetaxonomyCleanr
dependency.
- Validate categorical variables: Validation of categorical variables has been enhanced with a check that each variable listed as "categorical" in table attributes templates are also listed in the categorical variables template.
- Improved handling of unsupported taxonomic authorities: Methods have been improved for taxa that have been manually resolved to unsupported taxonomic authoritiies (i.e. authorities other than "ITIS", "WORMS", "GBIF"). Values listed in the taxonomic coverage template's "authority_system" and "authority_id" fields, will become annotations in the EML returned by
make_eml()
. Seetemplate_taxonomic_coverage()
function docs for more details. NOTE: These new methods don't facilitate expansion of a taxon resolved in an unsupported system to the full classification hierarchy that is currently available when using ITIS, WORMS, or GBIF. That will require additional effort. Additionally,template_taxonomic_coverage()
now has anempty
argument for returning an empty template. This enhancement partially addresses issue #50.
-
Semantic annotation: EML can now be annotated. This implementation supports two use cases:
- New EML ... created by the EMLassemblyline workflow:
- Complete all metadata templates for your dataset (as usual)
- Run
template_annotations()
to create the annotations template- The annotations template (annotations.txt) reports the annotatable elements within your metadata and assigns default predicate annotations. You’ll have to add object annotations from ontologies of your choosing. You can remove annotations by deleting rows and add annotations by copying a subject's row, pasting it to a new line, then modifying the object annotation fields.
- Default annotations can be changed by the user
- Instructions for creating annotations.txt from scratch are included in the function docs (for users gathering annotations in other ways).
- Recurring nodes (e.g. ResponsibleParty) only require one set of annotations within annotations.txt
- Run
make_eml()
- Old EML ... created in other ways:
- Run template_annotations() for your EML file
- Run annotate_eml() to get an annotated revision of your EML file
Note: All annotated elements are assigned ids and their annotations are placed both immediately under the parent element (subject) and within the /eml/annotations node through id+reference pairs. This redundant approach supports variation in where EML metadata consumers prefer to harvest this information and supports annotation of EML elements requiring id+reference pairs.
- New EML ... created by the EMLassemblyline workflow:
-
Provenance metadata template: This extends support for provenance metadata of data sources external to the EDI Data Repository. Create the template with
template_provenance()
. Fixes issue #8 -
Allow creation of partial EML (part 2): This completes implementation of issue #34 by moving all evaluation of inputs to make_eml() (and associated warning and error handling) from various locations in the code base to validate_templates(). With this implementation comes a new approach to communicating input issues to the user via template_issues, an object written to the global environment and formatted into a human readable report (message) when passed through issues().
-
UTF-8 character encoding: EMLassemblyline extracts metadata from data objects and may malform this content if the character encoding is not supported. In an attempt to minimize this issue and convert metadata into the UTF-8 encoding expected by EML, the Base R function
enc2utf8()
has been implemented anywhere metadata is extracted from data objects and written to file (i.e. templating functions) and anywhere template content is added to the EML (i.e.make_eml()
). Because this may create EML that inaccuratly represents the data object it describes (e.g. categorical variables encoded in UTF-8 but the data encoded in something else) warnings are now issued when the input data object is not UTF-8 (or ASCII) encoded as estimated byreadr::guess_encoding()
. Additionally, EMLassemblyline documentation now emphasizes the importance of encoding data objects in UTF-8 first and then beginning the metadata creation process. An encoding conversion of TextType metadata (i.e. abstract, methods, additional_info) has not yet been implemented.
- import_templates(): This function has been replaced by
template_core_metadata()
andtemplate_table_attributes()
. - define_catvars(): This function has been replaced by
template_categorical_variables()
. - extract_geocoverage(): This function has been replaced by
template_geographic_coverage()
. - affiliation argument of make_eml(): This argument has been replaced by
user.domain
- data.files argument of make_eml(): This argument has been replaced by
data.table
- data.files.description argument of make_eml(): This argument has been replaced by
data.table.description
- data.files.quote.character argument of make_eml(): This argument has been replaced by
data.table.quote.character
- data.files.url argument of make_eml(): This argument has been replaced by
data.table.url
- zip.dir argument of make_eml(): This argument has been replaced by
other.entity
- zip.dir.description argument of make_eml(): This argument has been replaced by
other.entity.description
- Patch for updated dependency (part 2): An updated dependency resulted in
template_table_attributes()
errors. This fix is an addendum to the prior fix (2.18.1).
- Patch for updated dependency: An updated dependency resulted in
template_table_attributes()
errors. This has been fixed.
- Allow creation of partial EML: During the draft process it is very useful to see a partial EML document, even if incomplete or invalid. Additionally, developers using EMLassemblyline as a backend (e.g. MetaShARK and Excel-to-EML) may not want the current set of input requirements for their applications. To accomodate these use cases, validation checks on inputs to
make_eml()
(often communicating best practice recommendations) have been refactored to return warnings rather than errors. Fixes issue #34.
- Coerce lat.col and lon.col inputs:
template_geographic_coverage()
lat.col
andlon.col
arguments expect numeric inputs and error if non-NA missing value codes are present. The values are now coerced to numeric, only complete cases returned in the geographic coverage template, and no errors occur. - Revert markdown parsing: Version 2.13.0 introduced better methods for parsing TextType templates (i.e. abstract, methods, and additional_info) from .docx, .txt, .md file types, however, .md lost some formatting controls. This has been fixed.
- Create EML for non-EDI repositories: Create EML for non-EDI repositories by refactoring the logic around the
make_eml()
function argumentsuser.id
,user.domain
, andpackage.id
. Details are listed in the function documentation.
- template reader: Fixed a bug in the metadata template reader.
- maintenance.description: The
maintenance.description
ofmake_eml()
is no longer required however, a missingmaintenance.description
will return a warning with the recommended best practice. - publisher: A data publisher can now be added by listing the person (or organization) with a "publisher" role to the personnel.txt template.
- project: Missing project information (i.e. Principal Investigator and project metadata) return a warning with the recommended practice.
- formatName: The formatName of an otherEntity is now auto-detected using the
mime
library. Undetected MIME Types are listed as "Unknown". Fixes issue #68. - distribution: Previously, when assigning a .//physical/distribtuion/online/url for two or more data tables or other entities, each was required to have a corresponding URL listed under the
make_eml()
argumentsdata.table.url
andother.entity.url
. Some use cases require assignment of a URL to only one in a list of two or more. This constraint as been relaxed so if a data object doesn't have a corresponding URL then use the values""
orNA
(e.g. if inmake_eml()
the argumentdata.table = c("nitrogen.csv", "decomp.csv")
, and a URL only exists for the second object, thendata.table.url = c("", "/url/to/decomp.csv")
.
-
Installation: Simplified instructions so dependencies will be installed but and users will not be asked to upgrade installed packages (a point of confusion among many).
-
Default false numeric attributes to character: Default user specified numeric attributes to character class when the attribute contains character values outside of that specified under the missingValueCode field of the attributes.txt template. A warning alerts the user of the issue and preserves the original data by not coercing to numeric.
- Boiler plate EMLassemblyline process: Add boilerplate function calls to /inst/templates/run_EMLassemblyline.R. This script is added to the users workspace via
template_directories()
. The boilerplate is meant to be a reminder and save the user a little time. Fixes issue #36.
- schemaLocation: Fixed schemaLocation and namespace to be a web resolvable address.
- TextType: Conversion of abstract.docx, methods.docx, and additional_info.docx to EML has been improved.
- Missing contact or creator: Contacts and creators are required by
make_eml()
but no errors were returned when missing from personnel.txt. The logic ofvalidate_templates()
has been updated to fix this issue. - Missing value codes in categorical variables: Some missing value codes defined in the table attributes template were making their way into the categorical variables template.
template_categorical_variables()
has been updated to recognize more missing value code types.
- Missing <taxonomicCoverage>: taxonomicCoverage supplied in taxonomic_coverage.txt was missing from the EML. This has been fixed.
- Missing <access> node: The access node was not being added to the EML. This has been fixed.
- Template reader: A bug in the tabular template reader has been fixed (see commit for details).
make_eml()
code: (For developers) The underlying code ofmake_eml()
is now more concise and understandable.
- NAs in templates: Revised logic to distinguish the difference between NAs listed in the missingValueCode field of the table attribute template when supplied to
make_eml()
via files or the input argumentx
. - Validate personnel roles: Revised logic to interpret personnel roles.
- NAs in templates: Users often add NAs to templates where EMLassemblyline expects "". This enhancement ignores these extraneous NAs.
- Missing value codes: Recent changes broke the proper handling of "NA" missing value codes. This has been fixed.
- Schema validation: The referenced schema location was not correct. Now it is (issue #59).
- Schema validation: A schema validation error sporadically occurs under EML 2.1.1. Upgrading to EML 2.2.0 fixes this (issue #59).
- Template checks: A new suite of checks on metadata template content have been added to more effectively communicate issues and reduce errors. Fixes issue #6, issue #35, issue #37, and issue # 53.
- Online distribution: The previous implementation for providing URLs by which the data can be publicly downloaded required all data objects to be co-located in the same directory, which is too restrictive. Now URLs can be explicitly listed for each data object. Fixes issue #56.
- data.url: The
data.url
argument has been deprecated in favor ofdata.table.url
andother.entity.url
but will be supported until March 11, 2021.
- Delimiter guessing: Occasionally the content of tabular templates leads
data.table::fread()
to guessing a delimiter other than "\t". This issue has been fixed by explicitly stating the expected field delimiter.
- File names containing spaces caused
template_categorical_variables()
to crash: Errors occurred when input file names contained spaces. Using spaces is still a common practice among users. To accommodate this while continuing to promote best practices, the naming restriction has been relaxed and the best practices have been made a warning. The function checking file presence and naming conventions isEDIutils::validate_file_name()
. An explicit file name specification (i.e. including extension) is now required, which precludes errors when the same file name is used among different file types in the same directory. Fixes issue #25. - Validate units: Check that all numeric attributes have corresponding units and these units can be found in the EML standard unit dictionary or the custom_units.txt template. If not, then throw an error along with directions for fixing the issue. This check is called from make_eml(). Fixes issue #38.
- Multiple inputs to
template_taxonomic_coverage()
: If the taxa of a dataset are in more than one table, then a user would want to extract the unique taxa from all the tables and compile into the taxonomic_coverage.txt template. Multiple inputs to thetaxa.table
andtaxa.col
arguments is now supported.
- EML schema validation: Schema validation in
make_eml()
began failing with release of the dependency libary EML 2.0.2. This has been fixed. - Support
;
delimiters: Data tables with semi-colon delimiters were not supported. This was fixed by updatingEDIutils::detect_delimiter()
(issue #6 of the EDIutils package). - Entity Name: Content from
data.table.description
inmake_eml()
was used to fill in the entity name. However, they are not the same and entity name should be specified separately. This was fixed by addingdata.table.name
andother.entity.name
as arguments tomake_eml()
. The fix defaultsdata.table.name
todata.table
andother.entity.name
toother.entity
with a warning message. Fixes issue #24. - NULL output from
template_geographic_coverage()
: NULL was output from this function whenempty = TRUE
, which is mostly a cosmetic issue. This was fixed by implementing some simple logic. Fixes issue #32. - Updated table readers: Some user supplied data tables could not be read by
utils::read.table()
. To fix thisdata.table::fread()
, a more autonomous and robust reader, replacedread.table()
for reading both data and metadata templates. Fixes issue #41. - geographic_coverage.txt fields mixed up when translated to EML: Further testing revealed the bug doesn't exist. Fixes issue #43.
- Quotes in license templates: Unescaped quotes characters in the license files were being converted to the element thereby invalidating the EML. This was fixed by adding escape characters to the quotes.
- Testing
template_taxonomic_coverage()
: Travis CI has been failing because of long responses from API calls made bytemplate_taxonomic_coverage()
. To expedite tests and reduce errors, the example data now contains substantially fewer taxa to be resolved against authority systems.
- Unit dictionary: The
view_unit_dictionary()
function was opening the unit dictionary in a separate non-searchable window. By removing theutils
namespace from the function call the unit dictionary now opens within the RStudio source pane where searching is supported. - Missing value codes: The
EML
v2.0.0 refactor resulted in changes to how missing value codes are handled. This fix restores the original functionality where empty character strings in the missing value code and explanation fields don't result in validation errors. - Intellectual rights character encoding: The intellectual rights licenses (CC0 and CC-BY) contained non-UTF-8 encoded quote characters that produced invalid EML. These have been removed.
- Geographic coverage sources: Only one geographic coverage input is allowed to the
make_eml()
function at a time. Valid sources are the geographic_coverage.txt template, thegeographic.coordinates
andgeographic.description
arguments ofmake_eml()
, and the deprecated bounding_boxes.txt template. - Missing value codes as categorical variables: Missing value codes were being incorrectly listed as categorical variables by
template_categorical_variables()
. This issue has been fixed. - Taxonomic coverage: Invalid taxonomic coverage was being generated by EDI's
taxonomyCleanr::make_taxonomicCoverage()
. This issue has been fixed in that projects GitHub master branch, and the necessary adjustments have been made toEMLassemblyline::make_eml()
.
- EML v2.0.0:
EMLassemblyline
has been refactored to run with the `EML' v2.0.0 dependency. - Text type metadata may now be supplied in .docx and .md files: Support for creating abstract, methods, and additional information metadata has been extended from simple text files to Microsoft Word (.docx) and Markdown (.md) file formats. Formatting of these files are translated to EML via
markdown
> Pandoc > docbook. - Create an empty geographic_coverage.txt: Sometimes the geographic coverage of a dataset cannot be extracted from a table and needs to be entered manually. Use the
template_geographic_coverage()
argumentempty = TRUE
to create an empty geographic_coverage.txt template.
- Add function examples: Add examples to function documentation.
- Change template import: Import custom_units.txt with
template_table_attributes()
instead of withtemplate_core_metadata()
. This is a more logical pairing.
- The argument validator should not check geographic coverage templates: This fix moves the presence/absence check for geographic coverage templates to
make_eml()
. - v2.4.6 functions should be accessible: This fix exports
EMLassemblyline
2.4.6 functions that should be otherwise accessible for backwards compatibility. - File names should not require extensions: This fix restores functionality that was lost in the recent refactor.
- Website: Improved documentation with vignettes demonstrating common and advanced use cases. Implemented with
pkgdown
. - Templating functions: Functions creating metadata templates are grouped under the prefix
template_*
to simplify user understanding.template_arguments()
Create template for all user inputs toEMLassemblyline
(i.e. metadata template content and function arguments) to entirely programmatic workflows with focus on supporting content ingestion from upstream metadata sources.template_categorical_variable()
Create categorical variables template (previously nameddefine_catvars()
).template_core_metadata()
Create core metadata templates (previously part ofimport_templates()
).template_directories()
Create a simple and effective directory structure forEMLassemblyline
files and data package contents.template_geographic_coverage()
Create geographic coverage template (a refactor ofextract_geocoverage()
).template_table_attributes()
Create table attributes templates (previously part ofimport_templates()
).template_taxonomic_coverage()
Create the taxonomic coverage template for resolving taxa to one or more authority systems and supporting creation of the hierarchical rank specific EML taxonomicCoverage element bymake_eml()
.
- Support for other entity data packages: Data packages comprised completely of other entities (i.e. non-tabular data) is now supported.
- Geographic coverage: All geographic coverage metadata has been moved to //dataset/coverage, where most data repositories find it for rendering to maps and other visualizations for users.
- Make EML for other data repositories: Arguments requiring EDI specific content (i.e.
user.id
,user.domain
,package.id
) have been relaxed to enable creation of EML for other data repositories. - Better entity descriptions: Use arguments
data.table.description
andother.entity.description
for //dataTable/entityName and //otherEntity/entityName, respectively. This provides a more meaningful file description than the file name it self.
Several templating functions, templates, and arguments have been deprecated. Full backwards compatibility of these functions, templates, and arguments will be supported for the next year (i.e. until May 1, 2020).
Functions:
import_templates()
is deprecated in favor oftemplate_core_metadata()
(i.e. metadata required by all data packages) andtemplate_table_attributes()
(i.e. metadata for data tables).define_catvars()
is deprecated in favor oftemplate_categorical_variables()
.extract_geocoverage()
is deprecated in favor oftemplate_geographic_coverage()
Templates:
- bounding_boxes.txt is deprecated in favor of geographic_coverage.txt
- geographic_coverage.txt is deprecated in favor of a new version of geographic_coverage.txt that supports both point locations and areas.
Arguments:
data.files
is deprecated in favor ofdata.table
data.files.description
is deprecated in favor ofdata.table.description
data.files.quote.character
is deprecated in favor ofdata.table.quote.character
data.files.url
is deprecated in favor ofdata.url
zip.dir
is deprecated in favor ofother.entity
zip.dir.description
is deprecated in favor ofother.entity.description
affiliation
is deprecated in favor ofuser.domain
- otherEntity url: Add EML element //otherEntity/physical/distribution/online/url via
make_eml()
. This element was missing though documentation implied its existence.