Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All params should be optional for ngram tokenizer and edge ngram tokenizer #877

Closed
39charactersisnotenoughforagoodusername opened this issue Sep 5, 2024 · 1 comment · Fixed by #887
Labels
Area: Specification Related to the API spec used to generate client code Category: Bug Something isn't working

Comments

@39charactersisnotenoughforagoodusername
Copy link

Java API client version

8.14.3

Java version

17

Elasticsearch Version

8.14.3

Problem description

Hello! I'm running into a MissingRequiredPropertyException when trying to create indices that use ngram/edge ngram tokenizers with the Java client.

Minimal code repro - settings are the same as docs but with token_chars omitted:

String settingsJson =
        "{\"settings\": {\"analysis\": {\"analyzer\": {\"my_analyzer\": {\"tokenizer\": \"my_tokenizer\"}},\"tokenizer\": {\"my_tokenizer\": {\"type\": \"ngram\",\"min_gram\": 3,\"max_gram\": 3}}}}}";
IndexSettings settings = IndexSettings.of(i -> i.withJson(new StringReader(settingsJson)));

throws

Exception in thread "main" co.elastic.clients.json.JsonpMappingException: Error deserializing co.elastic.clients.elasticsearch._types.analysis.TokenizerDefinition: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.tokenChars' (JSON path: settings.analysis.tokenizer.my_tokenizer) (line no=1, column no=162, offset=161)
	at co.elastic.clients.json.JsonpMappingException.from0(JsonpMappingException.java:134)
	at co.elastic.clients.json.JsonpMappingException.from(JsonpMappingException.java:121)
...
	at co.elastic.clients.elasticsearch.indices.IndexSettings.of(IndexSettings.java:308)
	at Scratch.main(Scratch.java:11)
Caused by: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.tokenChars'
	at co.elastic.clients.util.ApiTypeHelper.requireNonNull(ApiTypeHelper.java:76)
	at co.elastic.clients.util.ApiTypeHelper.unmodifiableRequired(ApiTypeHelper.java:141)
	at co.elastic.clients.elasticsearch._types.analysis.NGramTokenizer.<init>(NGramTokenizer.java:79)
Similar example throwing for no maxGram if min_gram/max_gram/token_chars are all omitted
String settingsJson =
        "{\"settings\": {\"analysis\": {\"analyzer\": {\"my_analyzer\": {\"tokenizer\": \"my_tokenizer\"}},\"tokenizer\": {\"my_tokenizer\": {\"type\": \"ngram\"}}}}}";
IndexSettings settings = IndexSettings.of(i -> i.withJson(new StringReader(settingsJson)));
Exception in thread "main" co.elastic.clients.json.JsonpMappingException: Error deserializing co.elastic.clients.elasticsearch._types.analysis.TokenizerDefinition: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.maxGram' (JSON path: settings.analysis.tokenizer.my_tokenizer) (line no=1, column no=134, offset=133)
	at co.elastic.clients.json.JsonpMappingException.from0(JsonpMappingException.java:134)
	at co.elastic.clients.json.JsonpMappingException.from(JsonpMappingException.java:121)
...
	at co.elastic.clients.elasticsearch.indices.IndexSettings.of(IndexSettings.java:308)
	at Scratch.main(Scratch.java:11)
Caused by: co.elastic.clients.util.MissingRequiredPropertyException: Missing required property 'NGramTokenizer.maxGram'
	at co.elastic.clients.util.ApiTypeHelper.requireNonNull(ApiTypeHelper.java:76)
	at co.elastic.clients.elasticsearch._types.analysis.NGramTokenizer.<init>(NGramTokenizer.java:77)

This seems to be because the spec that's used to generate the Java client requires min_gram, max_gram, and token_chars for ngram/edge ngram tokenizers, even though they have defaults in docs (also supported by server code and Lucene defaults).

I can also confirm that creating an index via curl without specifying min_gram / max_gram / token_chars works.

kubectl exec es8-data-0 -- curl -XPUT "https://localhost:9200/test-index" -H "Content-Type: application/json" -d '{"settings": {"analysis": {"analyzer": {"my_analyzer": {"tokenizer": "my_tokenizer"}},"tokenizer": {"my_tokenizer": {"type": "ngram"}}}}}'

returns

{"acknowledged":true,"shards_acknowledged":true,"index":"test-index"}

The same is true for "type": "edge_ngram" as well.

@l-trotta
Copy link
Contributor

l-trotta commented Sep 6, 2024

Hello, thank you for the detailed report! We'll fix the specification soon and regenerate the java client to fix this issue.

@l-trotta l-trotta added Category: Bug Something isn't working Area: Specification Related to the API spec used to generate client code labels Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Specification Related to the API spec used to generate client code Category: Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants