scholars-discovery

VIVO Scholars Discovery is a middleware project that pulls VIVO content into its own search index (Solr) and then exposes that content via a RESTful service endpoint.

Various frontend applications are available (or can be built) to display the content as read-only websites. Existing frontend applications include:

VIVO Scholars Angular

API

Scholars Discovery REST Service API Documentation

Background

Scholars Discovery project was initiated by Scholars@TAMU project team at Texas A&M University (TAMU) Libraries. In support of the Libraries’ goal of enabling and contextualizing the discovery of scholars and their expertise across disciplines, the Scholars’ team at TAMU Office of Scholarly Communications (OSC) proposed the Scholars version 2 project, which focuses on deploying (1) new public facing layer (Read-only), (2) faceted search engine, (3) Data reuse options, and (4) search engine optimization. Digital Initiative (DI) at TAMU Libraries collaborated with the OSC to design and implement the current system architecture including Scholars Discovery and VIVO Scholars Angular. In a later stage, Scholars Discovery project was adopted by VIVO Community’s VIVO Scholar Task Force.

Technology

Scholars discovery system is first and foremost an ETL system in which extracts data from VIVO's triplestore, transforms triples into flattened documents, and loads the documents into Solr. The Solr index is then exposed via REST API and GraphQL API as a nested JSON. A secondary feature is that of providing a persistent, configurable discovery layout for rendering a UI.

Extraction from VIVO is done view configurable harvesters in which make SPARQL requests to the triplestore for a collection of objects and subsequent SPARQL requests for each property value of the target document. The SPARQL requests can be found in src/main/resources/templates/sparql. The transformation is done granularly converting resulting triples of a SPARQL request into a property of a flattened document. This document is then saved into a heterogeneous Solr collection. The configuration of the Solr collection can be found in solr/config. In order to represent a flatten document as a nested JSON response, the field values are indexed with a relationship identifier convention. [value]::[id], [value]::[id]::[id], etc. During serialization the document model is traversed parsing the Solr field value and constructing a nested JSON.

Here is a list of some dependencies used:

Configuration

The basic Spring Boot application configuration can be found at src/main/resources/application.yml. Here you be able to configure basic server and spring configuration as well as custom configuration for Scholars Discovery. There are several configuration POJOs to represent configurations. They can be found in src/main/java/edu/tamu/scholars/middleware/config/model, and src/main/java/edu/tamu/scholars/middleware/auth/config.

Assets

Assets are hosted at /file/:id/:filename and configured location middleware.assets-location.

Tested options are

Assets stored in src/main/resources/assets

middleware.assets-location: classpath:/assets

Assets stored in externally

middleware.assets-location: file:/scholars/assets

Harvesting

Harvesting can be configured via middleware.harvesters and represented with HarvesterConfig. For each harvester, a bean will be created in which specifies the type of harvester and which document types it maps to. The reference implementation is the local triplestore harvester.

Indexing

Indexing can be configured via middleware.indexers and represented with IndexerConfig. For each indexer, a bean will be created in which specifies the type of indexer and which document types it indexes. The reference implementation is the solr indexer.

The application can be configured to harvest and index on startup, middleware.index.onStartup, and via a cron schedule via middleware.index.cron. The indexing is done in batch for performance. It can be tuned via middleware.index.batchSize.

Solr

Solr is configured via spring.data.solr.

Development Instructions

Install Maven
Install Docker
Start Solr

   cd solr && docker build --tag=scholars/solr . && docker run -d -p 8983:8983 scholars/solr && cd ..

Build and Run the application

   mvn clean install
   mvn spring-boot:run

Note: Custom application configuration can be achieved by providing a location and an optional profile, such as:

   mvn spring-boot:run -Dspring-boot.run.profiles=dev -Dspring-boot.run.config.location=/some/directory/

..where an application-dev.yml exists in the /some/location/ directory

Docker Deployment

docker build -t scholars/discovery .

docker run -d -p 9000:9000 -e SPRING_APPLICATION_JSON="{\"spring\":{\"data\":{\"solr\":{\"host\":\"http://localhost:8983/solr\"}}},\"ui\":{\"url\":\"http://localhost:3000\"},\"vivo\":{\"base-url\":\"http://localhost:8080/vivo\"},\"middleware\":{\"allowed-origins\":[\"http://localhost:3000\"],\"index\":{\"onStartup\":false},\"export\":{\"individualBaseUri\":\"http://localhost:3000/display\"}}}" scholars/discovery

The environment variable SPRING_APPLICATION_JSON will override properties in application.yml.

Docker Compose for Development

docker-compose up

This will provide Postgres database at localhost:5432, pgAdmin at localhost:8080, Solr at localhost:8983/8984/8985 with Zookeeper at localhost:2181/2182/2183. There should be multiple volume mounts at relative path pgdata, pgadmin, solr/solr1, solr/solr2, solr/solr3, zoo/zoo1, zoo/zoo2, and zoo/zoo3.

pgadmin\pgpass and pgadmin\servers.json are required for authentication and initial registration of scholars postgres database.

To run the mvn spring-boot:run command with SPRING_APPLICATION_JSON defined, you can use the following approach:

SPRING_APPLICATION_JSON='{"solr.client":"cloud"}' mvn spring-boot:run

Save the following as config.json.

{
  "solr.client": "cloud"
}

SPRING_APPLICATION_JSON=$(cat config.json) mvn spring-boot:run

For Windows Command Prompt, the syntax is slightly different:

set SPRING_APPLICATION_JSON={"solr.client":"cloud"} && mvn spring-boot:run

For Windows PowerShell:

$env:SPRING_APPLICATION_JSON='{"solr.client":"cloud"}'; mvn spring-boot:run

Verify Installation

With the above installation instructions, the following service endpoints can be verified:

*Not available with mvn spring-boot:run alone. Run mvn clean install site before and do no clear target directory.

The HAL(Hypertext Application Language) explorer can be used to browse scholars-discovery resources.

Name		Name	Last commit message	Last commit date
Latest commit History 1,652 Commits
.github		.github
pgadmin		pgadmin
solr		solr
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
checkstyles.xml		checkstyles.xml
docker-compose.yml		docker-compose.yml
owasp-dependency-check-suppressions.xml		owasp-dependency-check-suppressions.xml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scholars-discovery

API

Background

Technology

Configuration

Assets

Harvesting

Indexing

Solr

Development Instructions

Docker Deployment

Docker Compose for Development

Verify Installation

About

Uh oh!

Releases 34

Packages

Uh oh!

Languages

License

TAMULib/scholars-discovery

Folders and files

Latest commit

History

Repository files navigation

scholars-discovery

API

Background

Technology

Configuration

Assets

Harvesting

Indexing

Solr

Development Instructions

Docker Deployment

Docker Compose for Development

Verify Installation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 34

Packages 0

Uh oh!

Languages

Packages