Skip to content
Peter Dietz edited this page Dec 13, 2016 · 10 revisions

Simple Archive Format Builder, SAFBuilder, is a tool to package your content into a form suitable for batch import into DSpace.

INPUT: Directory containing CSV with metadata, and files. OUTPUT: Simple Archive Format package

To get started, read the README. https://github.com/peterdietz/SAFBuilder/blob/master/README

Getting Started / Installing on: Linux or Windows

There is also a Usage Guide on Simple Archive Format Packager at Duraspace.

Basic Usage / Instructions

Obtain the source code

git clone git://github.com/DSpace-Labs/SAFBuilder.git
cd SAFBuilder

Compile the source code

./recompile.sh

Test that it works

./safbuilder.sh

It should return the usage syntax:

USAGE: BatchProcess /path/to/directory metadatafilename.csv
Hint -- directory: Use absolute path and no trailing slashes
Hint -- metadatafilename: needs to be in the directory, as do the content files

Run the SAFBuilder on the sample data

./safbuilder.sh /path/to/SAFBuilder/src/edu/osu/kb/sample_data AAA_batch-metadata.csv

This should then run the SAFBuilder over the included SampleData that came with this package. You can then inspect the SimpleArchiveFormat directory that was created, and that would then be suitable input for batch import to DSpace.

Here is the syntax of importing SAF Packages into DSpace using ItemImport. Basic-DSpace-Import-Process

sudo /dspace/bin/dspace import -a 
    -e [email protected] 
    -c 1811/49710 
    -s /home/peterdietz/Desktop/MelanieSeedsBatch/SimpleArchiveFormat/ 
    -m /home/peterdietz/Desktop/MelanieSeedsBatch/seedsbatch1.map

Run SAFBuilder on YOUR data

Make a CSV with column headers of 'filename', 'dc.title', 'dc.creator', 'dc.date.issued', 'dc.desciption.abstract'.

For each row, put in your content.

  • filename will contain a path to the filename, i.e. ARC_0112.pdf, or ARC/001.pdf depending on your organization.
  • dc.something is your metadata using the Dublin Core name space. Other metadata namespaces are allowed. You can add or change the metadata fields.
  • If you have multiple values for a field, such as multiple authors, separate each entry with two pipe characters. i.e || this was chosen as it is unlikely to exist in your content.

Advanced Usage

Specifying Bundles

You specify files for content bundles by having a header of "filename". You can also specify files that they should be sent to a specified bundle with "filename__bundle:BUNDLENAME", where BUNDLENAME is whatever you want. This might be for when you have to upload files that are not destined for public consumption. An example use case is for uploading custom proxy licenses that are PDF's that don't go into the system license bundle. The SAFBuilder will automatically add the tab separator.

The import tool additionally allows the following as extra parameters for a file.

bundle:BUNDLENAME
permissions:PERMISSIONS
description:DESCRIPTION
primary:true

BUNDLENAME is the name of the bundle to which the bitstream should be added. Without specifying the bundle, items will go into the default bundle, ORIGINAL.

PERMISSIONS is text with the following format: -[r|w] 'group name'

DESCRIPTION is text of the files description.

PRIMARY is used to specify the primary bitstream.

In SAFBuilder, you can use these by separating them with double underscore.

For example, to see a description and specify which bundle to put the bitstream into use:

filename__bundle:MySpecialBundle__primary:true__description:Something really cool

To have the bitstreams in that column be restricted to Administrator group READ-ONLY (i.e. no anonymous read):

filename__permissions:-r 'Administrator'

Thus, you can have multiple columns, have some objects go into the main bundle, and some objects going into a custom bundle and restricted to administrators.

filename filename__bundle:PROXY-LICENSE__permissions:-r 'Administrator'
student-thesis.pdf University-Legal-Proxy-License-signed.pdf

Extracting files from ZIP files

One of the largest producers of content for us likes to give us about 500 records, and each record includes the content pre-ZIP'ed into a ZIP file. Instead of doing crazy amounts of manual processing, we have adjusted SAFBuilder to accept content in ZIP files, unpack it, and add each file within the ZIP to the record.

Instead of using a header of filename, use filegroup. The double underscores from above can also be used in conjunction. i.e. filegroup_bundle:THUMBNAIL