VIVO Scholars Discovery is a middleware project that pulls VIVO content into its own search index (Solr) and then exposes that content via a RESTful service endpoint.
Various frontend applications are available (or can be built) to display the content as read-only websites. Existing frontend applications include:
Scholars Middleware REST Service API Documentation
Scholars Discovery project was initiated by Scholars@TAMU project team at Texas A&M University (TAMU) Libraries. In support of the Libraries’ goal of enabling and contextualizing the discovery of scholars and their expertise across disciplines, the Scholars’ team at TAMU Office of Scholarly Communications (OSC) proposed the Scholars version 2 project, which focuses on deploying (1) new public facing layer (Read-only), (2) faceted search engine, (3) Data reuse options, and (4) search engine optimization. Digital Initiative (DI) at TAMU Libraries collaborated with the OSC to design and implement the current system architecture including Scholars Discovery and VIVO Scholars Angular. In a later stage, Scholars Discovery project was adopted by VIVO Community’s VIVO Scholar Task Force.
Scholars discovery system is first and foremost an ETL system in which extracts data from VIVO's triplestore, transforms triples into flattened documents, and loads the documents into Solr. The Solr index is then exposed via REST API and GraphQL API as a nested JSON. A secondary feature is that of providing a persistent, configurable discovery layout for rendering a UI.
Extraction from VIVO is done view configurable harvesters in which make SPARQL requests to the triplestore for a collection of objects and subsequent SPARQL requests for each property value of the target document. The SPARQL requests can be found in src/main/resources/templates/sparql. The transformation is done granularly converting resulting triples of a SPARQL request into a property of a flattened document. This document is then saved into a heterogeneous Solr collection. The configuration of the Solr collection can be found in solr/config. In order to represent a flatten document as a nested JSON response, the field values are indexed with a relationship identifier convention. [value]::[id]
, [value]::[id]::[id]
, etc. During serialization the document model is traversed parsing the Solr field value and constructing a nested JSON.
Here is a list of some dependencies used:
The basic Spring Boot application configuration can be found at src/main/resources/application.yml. Here you be able to configure basic server and spring configuration as well as custom configuration for Scholars Discovery. There are several configuration POJOs to represent configurations. They can be found in src/main/java/edu/tamu/scholars/middleware/config/model, and src/main/java/edu/tamu/scholars/middleware/auth/config.
Assets are hosted at /file/:id/:filename
and configured location middleware.assets-location
.
Tested options are
Assets stored in src/main/resources/assets
middleware.assets-location: classpath:/assets
Assets stored in externally
middleware.assets-location: file:/scholars/assets
Harvesting can be configured via middleware.harvesters
and represented with HarvesterConfig. For each harvester, a bean will be created in which specifies the type of harvester and which document types it maps to. The reference implementation is the local triplestore harvester.
Indexing can be configured via middleware.indexers
and represented with IndexerConfig. For each indexer, a bean will be created in which specifies the type of indexer and which document types it indexes. The reference implementation is the solr indexer.
The application can be configured to harvest and index on startup, middleware.index.onStartup
, and via a cron schedule via middleware.index.cron
. The indexing is done in batch for performance. It can be tuned via middleware.index.batchSize
.
Solr is configured via spring.data.solr
.
cd solr && docker build --tag=scholars/solr . && docker run -d -p 8983:8983 scholars/solr && cd ..
- Build and Run the application
mvn clean install
mvn spring-boot:run
- Note: Custom application configuration can be achieved by providing a location and an optional profile, such as:
mvn spring-boot:run -Dspring-boot.run.profiles=dev -Dspring-boot.run.config.location=/some/directory/
- ..where an
application-dev.yml
exists in the/some/location/
directory
docker build -t scholars/discovery .
docker run -d -p 9000:9000 -e SPRING_APPLICATION_JSON="{\"spring\":{\"data\":{\"solr\":{\"host\":\"http://localhost:8983/solr\"}}},\"ui\":{\"url\":\"http://localhost:3000\"},\"vivo\":{\"base-url\":\"http://localhost:8080/vivo\"},\"middleware\":{\"allowed-origins\":[\"http://localhost:3000\"],\"index\":{\"onStartup\":false},\"export\":{\"individualBaseUri\":\"http://localhost:3000/display\"}}}" scholars/discovery
The environment variable
SPRING_APPLICATION_JSON
will override properties in application.yml.
docker-compose up
This will provide Postgres database at localhost:5432 and Solr at localhost:8983. There should be two volume mounts at relative path pgdata
and solr/data
.
To run the mvn spring-boot:run
command with SPRING_APPLICATION_JSON
defined, you can use the following approach:
SPRING_APPLICATION_JSON='{"spring.datasource.driver-class-name":"org.postgresql.Driver","spring.datasource.url":"jdbc:postgresql://localhost:5432/scholars","spring.jpa.database-platform":"org.hibernate.dialect.PostgreSQLDialect","spring.sql.init.platform":"postgres"}' mvn spring-boot:run
Save the following as config.json
.
{
"spring.datasource.driver-class-name": "org.postgresql.Driver",
"spring.datasource.url": "jdbc:postgresql://localhost:5432/scholars",
"spring.jpa.database-platform": "org.hibernate.dialect.PostgreSQLDialect",
"spring.sql.init.platform": "postgres"
}
SPRING_APPLICATION_JSON=$(cat config.json) mvn spring-boot:run
For Windows Command Prompt, the syntax is slightly different:
set SPRING_APPLICATION_JSON={"spring.datasource.driver-class-name":"org.postgresql.Driver","spring.datasource.url":"jdbc:postgresql://localhost:5432/scholars","spring.jpa.database-platform":"org.hibernate.dialect.PostgreSQLDialect","spring.sql.init.platform":"postgres"} && mvn spring-boot:run
For Windows PowerShell:
$env:SPRING_APPLICATION_JSON='{"spring.datasource.driver-class-name":"org.postgresql.Driver","spring.datasource.url":"jdbc:postgresql://localhost:5432/scholars","spring.jpa.database-platform":"org.hibernate.dialect.PostgreSQLDialect","spring.sql.init.platform":"postgres"}'; mvn spring-boot:run
With the above installation instructions, the following service endpoints can be verified:
The HAL(Hypertext Application Language) explorer can be used to browse scholars-discovery resources.