Highly and easy configurable elasticsearch adapter for TYPO3.
- Elasticsearch 7
The use of staticfilecache with threads enabled and Elasticsearch->ProgressBodyMiddleware is not yet possible.
Usage of EXT:staticfilecache is strongly recommended as the page indexing happens on the fly when the middleware is being executed. No code execution -> no performance issues. Usage of typo3_console is strongly recommended for executing commands easily.
- Add two comments to your default fluid layout to mark the indexable contents: "<!--TYPO3SEARCH_begin-->" and "<!--TYPO3SEARCH_end-->". (Add those sections mutliple times if you wish to.)
- Add the yaml configuration to your sites.
- Run the command "elasticsearch:create-indices" to create the needed structure inside your elasticsearch
- Run the command "elasticsearch:index-records" to index your records.
- Create a TypeNum to the SearchController / searchAction or create a plugin on a noindex page to which you send an ajax request, as the searchAction only returns json with results.
You may implement your own plugin for search handling based on your needs. The searchAction in this extension only provides a basic search (which already should fit most needs).
Deletes existing indices and creates the indices you configured inside the yaml-files.
Indexes all records defined in your "tables"-section. Also deletes all records newly marked as "hidden" or "deleted", also deletes deleted or hidden pages from the index.
At the moment there is no command to add fields to an index afterwards. You will have to use "elasticsearch:create-indices". This may cause data loss and reindexing.
This extension uses the site configuration of TYPO3 to configure indices, settings, fields and mappings to your instance. This way multiple page trees can use different indices to work with.
Take care of the indentations and list types in yaml, look for the configuration example if something does not work as intended.
Name | Type | Description |
---|---|---|
usePageMiddleware | bool | Use middleware page processing. Therefore pages are getting indexed for every uncached page request. |
index | string | Name of the index inside elasticsearch. Default: "typo3" |
server | array | Elasticsearch server connection configuration |
analyzers | array | Define your own analyzers |
mapping | array (list) | Elasticsearch field definitions |
tables | array | Mapping of TYPO3-values to elasticsearch fields, indexer configuration |
searchFields | array | List of fields which should be searched in |
You can use the elasticsearch-php-api docs to see which host configuration options you can use.
# Example configuration
# yoursite.yaml
# TYPO3 defaults
base: 'https://www.foo.com/'
# Elasticsearch configuration
elasticsearch:
server:
host: elastic.foo.com
port: 9200
indices:
# define a default
german: &default
index: german
# copy the default to english
english:
<< : *default
index: english
# create something new
products:
index: products
The overview for all field types and configuration options can also be seen inside the elasticsearch docs. This section creates the fields which are created inside the elasticsearch index.
elasticsearch:
# https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html
mapping:
-
name: title
# see https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-params.html
parameters:
type: keyword
store: true
-
name: teaser
parameters:
type: text
index: false
store: true
-
name: content
parameters:
type: text
store: true
copy_to: escaped_content
-
name: escaped_content
parameters:
type: text
store: true
analyzer: html_analyzer
Of course the site/language combination needs to know it's index name to use
languages:
-
title: 'German website'
elasticsearch:
index: de-de
You can specify and use your own analyzers as well. Here is an example of an analyzer with stripped html-chars and a lowercase filter:
elasticsearch:
indices:
german:
# see https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-custom-analyzer.html
analyzers:
html_analyzer:
type: custom
tokenizer: standard
char_filter:
- html_strip
filter:
- lowercase
To map the database-fields to the matching elasticsearch-fields you can simply create another mapping inside of your yaml-file. All indices inside of "tables" are table names of your TYPO3 project (pages, tt_address, tx_myext_domain_model_name):
elasticsearch:
indices:
german:
usePageMiddleware: true
tables:
pages:
mapping:
# content and url are predefined variables in php
# elasticValue: typo3Value
content: content
url: url
title: title
Pages are getting indexed on uncached page-load via middleware. This means after changing a page the changes are synchronized immediately to elasticsearch.
- url
- content
You NEED to define those two fields. The url contains the url-path, the content contains the complete rendered html-content between the two TYPO3 search-tags: "<!--TYPO3SEARCH_begin-->" and "<!--TYPO3SEARCH_end-->".
So in order to index real content you have to define these two tags inside of your page layout. If "usePageMiddleware" is set to false, the middleware does not index pages. You can also index pages via the GenericTableIndexer (see below).
Pages flagged deleted, hidden, no_follow, no_index or with fe-group-permission are not getting indexed. Pages with route arguments or arguments in general are not getting indexed either.
All hidden, deleted, noindex, nofollow and fe_group pages are getting deleted every "elasticsearch:index-records" run. If you need to reindex your page simply hard-reload it.
It is possible to simply index all tables you have inside of your database. Take a look at this example:
elasticsearch:
indices:
german:
tables:
tx_myextension_domain_model_table:
indexClass: Pluswerk\Elasticsearch\Indexer\GenericTableIndexer
uriBuilderConfig:
extensionName: MyExtension
pluginName: Extension
controllerName: Extension
actionName: detail
# argument which name gets resolved, or the entity name - e.g. it is a event detail page, so most likely the argument is "event"
argumentName: table
# detail page uid
pageUid: 123
mapping:
content: text
teaser: teaser
title: title
# you need to provide "url" in order to automatically generate urls
url: placeholder
If no indexClass is given, the command does not index this table (e.g. needed if you use the pages middleware).
You can also define your own indexers, they have to extend the class "\Pluswerk\Elasticsearch\Indexer\AbstractElasticIndexer" and implement the process()-method. Just take a look at the GenericTableIndexer itself and adjust the logic to your needs.
You can also add urls to your entries. The field "url" is predestined and should be used. You can provide a valid "uriBuilderConfig" as seen above to automatically generate URLs to your detail pages.
elasticsearch:
server:
- '%env(HOST_ELASTIC)%'
# mapping schema for all indices
mapping:
-
name: title
parameters:
type: keyword
index: true
store: true
-
name: type
parameters:
type: keyword
index: true
store: true
-
name: contentExact
parameters:
type: text
store: true
index: false
copy_to: content
-
name: content
parameters:
type: text
store: true
index: true
analyzer: html_analyzer
-
name: url
parameters:
type: keyword
index: false
store: true
indices:
german: &default
index: german
searchFields:
- content
- title
analyzers:
html_analyzer:
type: custom
tokenizer: standard
char_filter:
- html_strip
filter:
- lowercase
usePageMiddleware: true
tables:
pages:
mapping:
# content and url are predefined variables in php
# elasticValue: typo3Value
content: content
url: url
title: title
rss-feed-news:
indexClass: Pluswerk\Elasticsearch\Indexer\RssFeedIndexer
config:
uri: <RSS FEED URI>
mapping:
title: title
publicationDate: pubDate
url: link
contentExact: description
type: type
uid: uid
english:
<< : *default
index: english
tables:
pages:
mapping:
content: content
url: url
title: title
tx_myextension_domain_model_table:
indexClass: Pluswerk\Elasticsearch\Indexer\GenericTableIndexer
uriBuilderConfig:
extensionName: MyExtension
pluginName: Extension
controllerName: Extension
actionName: detail
# argument which name gets resolved, or the entity name - e.g. it is a event detail page, so most likely the argument is "event"
argumentName: table
# detail page uid
pageUid: 123
mapping:
content: text
teaser: teaser
title: title
url: url
With this configuration, a request to the search-engine could look as follows (also for assembling with javascript):
curl https://mytypo3website.com/?type=12345&q=mysearchterm
elasticSearch = PAGE
elasticSearch {
typeNum = 12345
config {
disableAllHeaderCode = 1
debug = 0
}
10 = USER
10 {
userFunc = TYPO3\CMS\Extbase\Core\Bootstrap->run
pluginName = Elasticsearch
extensionName = Elasticsearch
controller = Search
vendorName = Pluswerk
}
}
The fields are used internally and will transform to
- uid = tablename + uid
- is removed after used as _id (concatenated tablename+uid).
- type = tablename + type
- if empty type = tablename
Witht he use of staticfilecache extension and their BoostQueue you can enable boost-time site indexing with a Listener
services:
Pluswerk\Elasticsearch\EventListener\BuildClientEventListener:
tags:
- name: event.listener
identifier: 'Pluswerk\Elasticsearch\EventListener\BuildClientEventListener'
event: SFC\Staticfilecache\Event\BuildClientEvent
Read more about the toggler, if you do not know what it does, leave it disabled to get mostly what you want.
Explicit mappings match any token sequence on the LHS of "=>" and replace with all alternatives on the RHS. These types of mappings ignore the expand parameter in the schema.
"sea biscuit, sea biscit => seabiscuit" vs "sea biscuit, sea biscit, seabiscuit"
#Dumb session This holds the workaround to do multiple languages in one CLI command. The identitymap has to forget about the previous language's object so the datamapper works correctly with new relations
Before: Model A (en), Submodel B (en)
... both translated
Model A (de), Submodel B (en)
Now correctly
Model A (de), Submodel B (de)
You can set the PUBLIC_HOST_ELASTIC environment Variable, it will be available in fluid {{data.elasticsearch.url}} (if you want another variable, change in TypoScript).
- fast featurerich full access
- perfect for public data
- perfect for use in vue
- send your search objects in the request body
If you do not set PUBLIC_HOST_ELASTIC environment Variablethe URI to the search Controller is built.
- fully bootstrapped TYPO3 elastic proxy
- you can add more complex queries with usergroups taken into account
- simply use the q Parameter "?q=fox" to search for fox