Skip to content

Work with Different Versions of Solr

haschart edited this page Sep 22, 2016 · 1 revision

Remote Access Solr Only

Different sites might be running widely disparate versions of Solr for any of a number of reasons. Previously to accommodate this SolrMarc contained code and a customized jar to attempt to work with different versions of Solr and to work around their differences. This was a complex and time-consuming task and thus SolrMarc often lagged significantly behind the the latest version of Solr. Furthermore, although SolrMarc used to attempt to support writing directly to Solr index files, due to the ever changing structure of Solr classes from version to version, this proved to me ever more difficult. The HTTP-based remote access class seemed to be the recommended and supported way to access Solr.

This newer version of SolrMarc approaches the problem in a different way. When you run the program you provide a path to a directory containing the Solrj jar from the Solr distribution you are using (and whatever additional jars that distribution recommends) SolrMarc will load those jars dynamically, and will attempt to load the class needed to communicate to Solr by name and call the methods therein using reflection.

Writing Chunks of Records

Another improvement in how SolrMarc interacts with Solr is that as records are indexed, SolrMarc will gather several hundred SolrInputDocument objects and send them to Solr in a single large chunk. Previous attempts at supporting this behavior had a serious flaw where a bad record in the batch being submitted (perhaps one with multiple instances of a non-multivalued field) would cause all subsequent records in that batch to be quietly discarded.

This issue is handled by receiving the return code from Solr and if it doesn’t report that all records were successful, it will split the chunk in to smaller pieces and retry each of these smaller chunks. If one of the smaller chunks encounters an error it will likewise be subdivided and resent, until the failing chunk is small enough. Then each of the records in that small failing chunk will be sent individually to Solr, and only those actually containing errors will not be sent.

Clone this wiki locally