mod-search
is based on metadata-driven approach. It means that resource description is specified using JSON file and
all rules, mappings and other thing will be applied by internal mod-search services.
Elasticsearch mapping field
types: field data types
The field type is used to define what search capabilities the corresponding field can provide. For example,
keyword
field type is used
for term queries
and aggregations
(providing facets for the record). The text fields are intended to use by
the full-text queries.
Property name | Description |
---|---|
name | The resource name, it used for searching by resource to determine index name, creating index settings and mappings |
parent | The parent resource name (currently, it is used for browsing by subjects when the additional index is added to arrange the instance subjects uniquely) |
eventBodyJavaClass | The Java class that incoming JSON can be mapped to. Currently, it's used to make processing of search field more convenient |
languageSourcePaths | Contains a list of json path expressions to extract languages values in ISO-639 format. If the multi-language is supported for the resource, this path must be specified. |
searchFieldModifiers | Contains a list of field modifiers, which pre-processes incoming fields for elasticsearch request. |
fields | List of field descriptions to extract values from incoming resource event |
fieldTypes | List of resource descriptions that can be used with an alias using $ref field of PlainFieldDescription . It's done to reduce duplication in resource description. |
searchFields | Contains a list of generated fields for the resource events (for example, It can be contain ISBN normalized values or generating subset of field values). |
indexMappings | Object with additional index mappings for resource (It can be helpful for copy_to functionality of Elasticsearch |
mappingSource | It's used to include or exclude some field from storing those values in _source object in Elasticsearch. Mainly, it's used to reduce the size per index. See also: _source field |
reindexSupported | Indicates if the resource could be reindexed |
Field type | Description |
---|---|
plain | This field type is default and there is no need to explicitly specify the field. It can be used to define all fields containing the following values: string, number, boolean, or array of plain values. |
object | This field type is used to mark that key contains subfield, each of subfield must have its own field description. |
authority | This field type is designed to provide special options to divide a single authority record into multiple based on the distinctType property value. |
Property name | Description |
---|---|
searchTypes | List of search types that are supported for the current field. Allowed values: facet , filter , sort |
searchAliases | List of aliases that can be used as a field name in the CQL search query. It can be used to combine several fields together during the search. For example, a query keyword all title combines for instance record following fields - title , alternativeTitles.alternativeTitle , indexTitle , identifiers.value , contributors.name Other way of using it - is to rename field keeping the backward compatibility without required reindex. |
index | Reference to the Elasticsearch mappings that are specified in index-field-types |
showInResponse | Marks field to be returned during the search operation. mod-search adds to the Elasticsearch query all marked field paths. See also: Source filtering |
searchTermProcessor | Search term processor, which pre-processes incoming value from CQL query for the search request. |
mappings | Elasticsearch fields mappings. It can contain new field mapping or can enrich referenced mappings, that comes from index-field-types |
defaultValue | The default value for the plain field |
indexPlainValue | Specifies if plain keyword value should be indexed with field or not. Works only for full-text fields. See also: Full-text plain fields |
sortDescription | Provides sort description for field. If not specified - standard rules will be applied for the sort field. See also: Sorting by fields |
Property name | Description |
---|---|
properties | Map where key - is the subfield name, value - is the field description |
Property name | Description |
---|---|
distinctType | Distinct type to split single entity to multiple containing only common fields excluding all other fields marked with other distinct types |
headingType | Heading type that should be set to the resource if a field containing some values. |
authRefType | Authorized, Reference, or Auth/Ref type for divided authority record. |
Elasticsearch mappings are created using field descriptions. All fields, that are specified in the record description will be added to the index mappings, and they will be used to prepare the Elasticsearch document.
By default, mappings are taken from index-field-types. It's
the common file containing pre-defined mapping values that can be accessed by reference from index
field of
PlainFieldDescription
. The field mappings for specific field can be enriched using mapping
field. Also,
the ResourceDescription
contains section indexMappings
which provides for developers to add custom mappings without
specifying them in the index-field-types.json
file.
For example, the resource description contains the following field description:
{
"fields": {
"f1": {
"index": "keyword",
"mappings": {
"copy_to": [ "sort_f1" ]
}
},
"f2": {
"index": "keyword"
}
},
"indexMappings": {
"sort_f1": {
"type": "keyword",
"normalizer": "keyword_lowercase"
}
}
}
Then the mappings' helper will create the following mappings object:
{
"properties": {
"f1": {
"type": "keyword",
"copy_to": [ "sort_f1" ]
},
"f2": {
"type": "keyword"
},
"sort_f1": {
"type": "keyword",
"normalizer": "keyword_lowercase"
}
}
}
In order to make mod-search create his own topic for kafka, it should be added to application.yml file with application.kafka.topics path.
Property name | Description |
---|---|
name | Topic base name that will be concatenated with environment name and tenant name. |
numPartitions | Break a topic into multiple partitions. Can be left blank in order to use default '-1' value. |
replicationFactor | Specify how much replicas do you need for a topic. Can be left blank in order to use default '-1' value. |
application:
kafka:
topics:
- name: search.instance-contributor
numPartitions: ${KAFKA_CONTRIBUTORS_TOPIC_PARTITIONS:50}
replicationFactor: ${KAFKA_CONTRIBUTORS_TOPIC_REPLICATION_FACTOR:}
Currently, supported 2 field types for full-text search:
- multi-language field values (see also: Language Analyzers)
- standard tokenized field values (see also: Standard Tokenizer)
Also, to support the wildcard
search by the whole phrase the plain values are added to the generated document. For
example, multi-language analyzed field with indexPlainValue = true
(default):
Source record:
{
"title": "Semantic web primer",
"language": "eng"
}
Result document:
{
"title": {
"eng": "Semantic web primer",
"src": "Semantic web primer"
},
"plain_title": "Semantic web primer"
}
Example of document with field with index = standard
:
Source:
{
"contributors": [
{
"name": "A contributor name",
"primary": true
}
]
}
Result document:
{
"contributors": [
{
"name": "A contributor name",
"plain_name": "A contributor name",
"primary": true
}
]
}
All fields marked with searchType = sort
must be available for sorting.
To sort by text values following field indices can be applicable:
keyword
(case-sensitive)keyword_lowercase
(case-insensitive)
Property name | Description |
---|---|
fieldName | Custom field name, if it is not specified - default strategy will be applied: sort_${fieldName} . |
sortType | Sort field type: single or collection |
secondarySort | List of fields that must be added as secondary sorting (eg, sorting by itemStatus and instance title fields) |
By default, if the field is only marked with searchType = sort
- the mod-search
will generate the following sort
condition:
{
"sort": [
{
"name": "sort_$field",
"order": "${value comes from cql query: asc/desc}"
}
]
}
if sortDescription
contains sortTYpe
as collection
the following rules will be applied:
- if
sortOrder
isasc
then themode
will be equal tomin
. It means that for sorting by a field containing a list of values - the lowest value will be picked for sorting. - if
sortOrder
isdesc
the themode
will be equal tomax
. It means that for sorting by a field containing a list of values - the highest value will be picked for sorting.
The project uses mostly only one framework for assertions - AssertJ A few examples:
assertThat(actualQuery).isEqualTo(matchAllQuery());
assertThat(actualCollection).isNotEmpty().containsExactly("str1", "str2");
assertThatThrownBy(() -> service.doExceptionalOperation())
.isInstanceOf(IllegalArgumentException.class)
.hasMessage("invalid parameter");
The module uses Testcontainers to run Elasticsearch, Apache Kafka and PostgreSQL in embedded mode. It is required to have Docker installed and available on the host where the tests are executed.
Navigate to the docker folder in the project and run docker-compose up
.
This will build local mod-search image and bring it up along with all necessary infrastructure:
- elasticsearch along with dashboards (kibana analogue from opensearch)
- kafka along with zookeeper
- postgres
- wiremock server for mocking external api calls (for example authorization)
Then, you should invoke
curl --location --request POST 'http://localhost:8081/_/tenant' \
--header 'Content-Type: application/json' \
--header 'x-okapi-tenant: test_tenant' \
--header 'x-okapi-url: http://api-mock:8080' \
--data-raw '{
"module_to": "mod-search-$version$",
"purge": "false"
}
to post some tenant in order to bring up kafka listeners and get indices created.
You can check which tenants enabled by wiremock in the src/test/resources/mappings/user-tenants.json
To rebuild mod-search image you should:
- bring down existing containers by running
docker-compose down
- run
docker-compose build mod-search
to build new mod-search image - run
docker-compose up
to bring up infrastructure
Hosts/ports of containers to access functionality:
http://localhost:5601/
- dashboards UI for elastic monitoring, data modification through dev consolelocalhost
- host,5010
- port for remote JVM debughttp://localhost:8081
- for calling mod-search REST api. Note that headerx-okapi-url: http://api-mock:8080
should be added to request for apis that take okapi url from headerslocalhost:29092
- for kafka interaction. If you are sending messages to kafka from java application withspring-kafka
then this host shoulb be added tospring.kafka.bootstrap-servers
property ofapplication.yml
Consortium feature is defined automatically at runtime by calling /user-tenants endpoint. Consortium feature on module enable is defined by 'centralTenantId' tenant parameter.
Invoke the following
curl --location --request POST 'http://localhost:8081/_/tenant' \
--header 'Content-Type: application/json' \
--header 'x-okapi-tenant: consortium' \
--header 'x-okapi-url: http://api-mock:8080' \
--data-raw '{
"module_to": "mod-search-$version$",
"parameters": [
{
"key": "centralTenantId",
"value": "consortium"
}
]
}
Then execute the following to enable member tenant
curl --location --request POST 'http://localhost:8081/_/tenant' \
--header 'Content-Type: application/json' \
--header 'x-okapi-tenant: member_tenant' \
--header 'x-okapi-url: http://api-mock:8080' \
--data-raw '{
"module_to": "mod-search-$version$",
"parameters": [
{
"key": "centralTenantId",
"value": "consortium"
}
]
}
Consider that tenantParameters
like loadReference
and loadSample
won't work because loadReferenceData
method is not implemented in the SearchTenantService
yet.