All params should be optional for ngram tokenizer and edge ngram tokenizer #877

39charactersisnotenoughforagoodusername · 2024-09-05T19:33:30Z

Java API client version

8.14.3

Java version

17

Elasticsearch Version

8.14.3

Problem description

Hello! I'm running into a MissingRequiredPropertyException when trying to create indices that use ngram/edge ngram tokenizers with the Java client.

Minimal code repro - settings are the same as docs but with token_chars omitted:

String settingsJson =
        "{\"settings\": {\"analysis\": {\"analyzer\": {\"my_analyzer\": {\"tokenizer\": \"my_tokenizer\"}},\"tokenizer\": {\"my_tokenizer\": {\"type\": \"ngram\",\"min_gram\": 3,\"max_gram\": 3}}}}}";
IndexSettings settings = IndexSettings.of(i -> i.withJson(new StringReader(settingsJson)));

throws

Exception in thread "main" co.elastic.clients.json.JsonpMappingException: Error deserializing co.elastic.clients.elasticsearch._types.analysis.TokenizerDefinition: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.tokenChars' (JSON path: settings.analysis.tokenizer.my_tokenizer) (line no=1, column no=162, offset=161)
	at co.elastic.clients.json.JsonpMappingException.from0(JsonpMappingException.java:134)
	at co.elastic.clients.json.JsonpMappingException.from(JsonpMappingException.java:121)
...
	at co.elastic.clients.elasticsearch.indices.IndexSettings.of(IndexSettings.java:308)
	at Scratch.main(Scratch.java:11)
Caused by: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.tokenChars'
	at co.elastic.clients.util.ApiTypeHelper.requireNonNull(ApiTypeHelper.java:76)
	at co.elastic.clients.util.ApiTypeHelper.unmodifiableRequired(ApiTypeHelper.java:141)
	at co.elastic.clients.elasticsearch._types.analysis.NGramTokenizer.<init>(NGramTokenizer.java:79)

Similar example throwing for no maxGram if min_gram/max_gram/token_chars are all omitted

String settingsJson =
        "{\"settings\": {\"analysis\": {\"analyzer\": {\"my_analyzer\": {\"tokenizer\": \"my_tokenizer\"}},\"tokenizer\": {\"my_tokenizer\": {\"type\": \"ngram\"}}}}}";
IndexSettings settings = IndexSettings.of(i -> i.withJson(new StringReader(settingsJson)));

Exception in thread "main" co.elastic.clients.json.JsonpMappingException: Error deserializing co.elastic.clients.elasticsearch._types.analysis.TokenizerDefinition: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.maxGram' (JSON path: settings.analysis.tokenizer.my_tokenizer) (line no=1, column no=134, offset=133)
	at co.elastic.clients.json.JsonpMappingException.from0(JsonpMappingException.java:134)
	at co.elastic.clients.json.JsonpMappingException.from(JsonpMappingException.java:121)
...
	at co.elastic.clients.elasticsearch.indices.IndexSettings.of(IndexSettings.java:308)
	at Scratch.main(Scratch.java:11)
Caused by: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.maxGram'
	at co.elastic.clients.util.ApiTypeHelper.requireNonNull(ApiTypeHelper.java:76)
	at co.elastic.clients.elasticsearch._types.analysis.NGramTokenizer.<init>(NGramTokenizer.java:77)

This seems to be because the spec that's used to generate the Java client requires min_gram, max_gram, and token_chars for ngram/edge ngram tokenizers, even though they have defaults in docs (also supported by server code and Lucene defaults).

I can also confirm that creating an index via curl without specifying min_gram / max_gram / token_chars works.

kubectl exec es8-data-0 -- curl -XPUT "https://localhost:9200/test-index" -H "Content-Type: application/json" -d '{"settings": {"analysis": {"analyzer": {"my_analyzer": {"tokenizer": "my_tokenizer"}},"tokenizer": {"my_tokenizer": {"type": "ngram"}}}}}'

returns

{"acknowledged":true,"shards_acknowledged":true,"index":"test-index"}

The same is true for "type": "edge_ngram" as well.

The text was updated successfully, but these errors were encountered:

l-trotta · 2024-09-06T09:38:30Z

Hello, thank you for the detailed report! We'll fix the specification soon and regenerate the java client to fix this issue.

l-trotta added Category: Bug Something isn't working Area: Specification Related to the API spec used to generate client code labels Sep 6, 2024

This was referenced Sep 27, 2024

Java issues batch 8 elastic/elasticsearch-specification#2959

Merged

Fixes from spec pr 2959 #887

Merged

l-trotta closed this as completed in #887 Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All params should be optional for ngram tokenizer and edge ngram tokenizer #877

All params should be optional for ngram tokenizer and edge ngram tokenizer #877

39charactersisnotenoughforagoodusername commented Sep 5, 2024 •

edited

Loading

l-trotta commented Sep 6, 2024

All params should be optional for ngram tokenizer and edge ngram tokenizer #877

All params should be optional for ngram tokenizer and edge ngram tokenizer #877

Comments

39charactersisnotenoughforagoodusername commented Sep 5, 2024 • edited Loading

Java API client version

Java version

Elasticsearch Version

Problem description

l-trotta commented Sep 6, 2024

39charactersisnotenoughforagoodusername commented Sep 5, 2024 •

edited

Loading