-
Notifications
You must be signed in to change notification settings - Fork 3
Running SolrMarc
The previous versions of SolrMarc would be provided with a configuration properties file, and a file of MARC records as inputs. The configuration file would:
- define specifics of how the MARC records being read should be pre-processed
- define where locally defined custom methods could be found
- define what directories should be searched for scripts and translations maps
- define how to connect to the Solr index
- define which other property file(s) contain the actual index specifications.
While it was useful to have all of these definitions in one place, having multiple, slightly different configurations is cumbersome, and requires several near-duplicate configuration files.
For the sake of backwards-compatibility, this new version of SolrMarc can read and operate from the the same configuration property files. However you can also specify each of those settings independently via command line arguments.
-reader_opts "file containing MARC Reader options"
-dir "directory to look in for scripts, mixins, and translation maps"
-solrURL "URL of Remote Solr to use"
-config "index specification file to use"
The "-reader_opts" option (or "-r" ) lists a file containing just the configuration options for the MarcReader such as:
- marc.to_utf_8 = (true|false)
- marc.permissive = (true|false)
- marc.default_encoding = (UTF8|MARC8|BESTGUESS|...)
The "-dir" option can define one or more directories (separated by comma or semi-colon or vertical bar) where configuration information can be found. The directories are searched in the order they are listed, and the first match is returned. If this option is omitted, than it is set to be the location of the solrmarc_core jar file, and even if it is provided, the location of the solrmarc_core jar is appended as the last entry in the list of search directories.
This list of directories is used to search:
- for the reader_options file
- for beanshell indexing scripts (in dir_in_list/index_script)
- for translation maps (in dir_in_list/translation_maps)
- for java source code to compile (in dir_in_list/index_java/src)
- for jar files needed for custom indexing methods (in dir_in_list/lib_local)
The "-solrURL" option (or "-u" ) specifies the URL of the running Solr server to which records are to be sent. The SolrInputDocuments records will be sent to the specified URL using the SolrJ library. If you specify the special case string of "stdout" rather than sending the records to Solr, it will print out what would have been sent in a human-readable way. If you specify the special case string of "devnull" it will perform all of the work of indexing the records, and building the SolrInputDocuments, and then throw them away. The "devnull" mode might be useful for performance testing, or perhaps for debugging.
The "-config" option (or "-c" ) specifies the file (or files) that contain the index specifications to be used in indexing. If multiple files are desired merely list them separated by commas. The files will be processed in the order they appear in the option. If a subsequent index specification file defines a index specification for the same field as appears in an earlier file, the earlier specification will be overridden and replaced.
-solrj "directory to look in for jars required for SolrJ"
-lib_local "directory to look in for additional jars and libraries"
The "-solrj" option specifies where the jar files that comprise SolrJ can be found. The default value is simply "lib-solrj" which is then looked-for in the list of directories specified for the "-dir" option. Since recent Solr distributions include the solr-solrj-X.X.X.jar in the solr/dist directory with all of the solr jars, but then place the jars on which solrj depends in a subdirectory named solrj-lib, you can merely specify "path_to_solr/dist/solrj-lib" as the value for this option, and if it cannot find solr-solrj-X.X.X.jar in that directory, it will look in the parent directory specifically for that one jar file.
The "-lib_local" option specifies where jar files needed by code within your custom indexing methods. The default value is simply "lib_local" which is then looked-for in the list of directories specified for the "-dir" option. For instance if one of your methods looked up a value in a MySQL database, the JDBC jars for communicating with MySQL would be placed in the lib_local directory. All jars in all of the expanded directory paths will be loaded by SolrMarc.
In addition to the values that can be specified via the above command-line options there are a number of special purpose System properties that can be defined on the command line using the the syntax -Dproperty=value that can affect the operation of SolrMarc. Many of these are for use by advanced users or for debugging or tuning purposes and may not be needed for most users.
-Dsolrmarc.method.report=true
At program termination report the amount of time each indexing
method required
-Dsolrmarc.indexer.threadcount=<int>
Number of parallel threads to use for indexing (defaults to 1)
If your custom methods or custom index scripts are not thread-safe, a
number higher than 1 may cause unexpected errors in your Solr records.
-Dsolrmarc.solrj.threadcount=<int>
Number of parallel threads to use for sending data to Solr (defaults to 4)
-DrunInEclipse=true
Solely for debugging shutdown hooks in Eclipse