Explain `ignore_above` better #129284

nik9000 · 2025-06-11T18:42:45Z

This concept is complicated.

Closes #128991

This concept is complicated. Closes elastic#128991

elasticsearchmachine · 2025-06-11T18:43:09Z

Pinging @elastic/es-docs (Team:Docs)

elasticsearchmachine · 2025-06-11T18:43:09Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

docs/reference/elasticsearch/mapping-reference/keyword.md

leemthompo · 2025-06-12T15:22:36Z

docs/reference/elasticsearch/mapping-reference/keyword.md

+    If you need to never reject documents, this should have some value `<=8191`. All documents with
+    more characters will just skip building the index for this field.
+
+    The defaults are complicated. It's `2147483647` (effectively unbounded) in standard indices and


Consider using bullets for defaults/dynamic mapping info for readability

Co-authored-by: Liam Thompson <[email protected]>

bmorelli25

This looked like fun so I couldn't resist. Sorry if I'm wrong! Also, hi Nik 👋 .

bmorelli25 · 2025-06-13T00:30:03Z

docs/reference/elasticsearch/mapping-reference/keyword.md

+:   Do not index any field containing a string with more characters than this value. This is important because {{es}}
+    will reject entire documents if they contain keyword fields that exceed `32766` bytes when UTF-8 encoded.
+
+    To avoid any risk of document rejection, set this value to `8191` or less. Fields with strings exceeding this
+    length will be excluded from indexing.


Does this work on text fields? Or only keyword fields?

Also further down you say:

`logsdb` indices: `8191`. `keyword` fields longer than `8191` characters won't be indexed, but the documents are accepted and the values unindexed values are available from `_source.

Does the previous statement only apply to logsdb indices? Or to standard indices as well? If both, that feels important.

What about this:

Skip indexing of a keyword value whose UTF-8–encoded size is larger than ignore_above. The value is still kept in _source, but the field won’t be searchable or aggregatable.

If you do not set ignore_above, {es} will reject entire documents if they contain one or more keyword fields exceeding a UTF-8–encoded size of 32766.

To avoid any risk of document rejection, set this value to 8191 or less.

Does this work on text fields? Or only keyword fields?

This setting is only available on keyword fields. But on text fields some tokenizers can have a max_token_length setting which doesn't ignore but instead splits tokens that exceed this length (so quite a bit different)

What about this:

I think it might be a bit clearer to specify characters/bytes, like "UTF-8–encoded size of 32766 bytes." and "set this value to 8191 characters or less."

bmorelli25 · 2025-06-13T00:34:02Z

docs/reference/elasticsearch/mapping-reference/keyword.md

+    The defaults are complicated:
+    * Standard indices: `2147483647` (effectively unbounded). Documents containing `keyword` fields longer than `32766`
+      bytes will be rejected.
+    * `logsdb` indices: `8191`. `keyword` fields longer than `8191` characters won't be indexed, but the documents are
+      accepted and the values unindexed values are available from `_source.


What about a table for this information?

The defaults are complicated:

Index type Default Effect

Standard indices 2147483647 (effectively unbounded) Documents will be rejected if any keyword exceeds 32766 bytes.

logsdb indices 8191 Documents are never rejected. Keywords exceding this limit are still kept in _source, but won’t be searchable or aggregatable.

Ahh I see this is in definition list already, so maybe a table won't work. But if you like my wording you can update accordingly.

Me like that wording :)

I think "Documents are never rejected" might be a bit too strongly worded; maybe something like:

Documents won't be rejected if a keyword field exceeds this limit and the field will still be kept in _source, but it won’t be searchable or aggregatable.

bmorelli25 · 2025-06-13T00:39:31Z

docs/reference/elasticsearch/mapping-reference/keyword.md

+    * The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields
+      defaults to a `text` field with a [sub](/reference/elasticsearch/mapping-reference/multi-fields.md)-`keyword`
+      field with an `ignore_above` of `256`.  String fields longer than 256 characters are available for full text
+      search but won't have a value in their `.keyword` sub-field they can not do exact matching over _search.


This part I struggle to understand. But it feels separate from the defaults above? Maybe this can be in a new paragraph. I think you're saying that...

When ES finds a new string field without an explicit mapping, it automatically:

Maps the field to a text field so the entire value is searchable with full-text search.

Adds a sub keyword field with ignore_above set to 256 bytes. This means that values less than 256 bytes are available for exact matching over _search. Values longer than that are still searchable via the text field, but are not indexed as keywords.

I agree I am a bit confused by the very last sentence in this paragraph.
@bmorelli25 I like your suggested rewrite, but I believe it should be "256 characters" not bytes

limotova

Looks good to me overall! Left a couple of comments
I remember when I was adjusting the docs for ignore_above I had to make some changes in docs-content as well, are there going to be similar PRs for these changes?

limotova · 2025-06-22T04:37:28Z

docs/reference/elasticsearch/mapping-reference/keyword.md

+    The defaults are complicated:
+    * Standard indices: `2147483647` (effectively unbounded). Documents containing `keyword` fields longer than `32766`
+      bytes will be rejected.
+    * `logsdb` indices: `8191`. `keyword` fields longer than `8191` characters won't be indexed, but the documents are
+      accepted and the values unindexed values are available from `_source.


I think "Documents are never rejected" might be a bit too strongly worded; maybe something like:

Documents won't be rejected if a keyword field exceeds this limit and the field will still be kept in _source, but it won’t be searchable or aggregatable.

limotova · 2025-06-22T04:37:36Z

docs/reference/elasticsearch/mapping-reference/keyword.md

+    * The [dynamic mapping](docs-content://manage-data/data-store/mapping/dynamic-mapping.md) for string fields
+      defaults to a `text` field with a [sub](/reference/elasticsearch/mapping-reference/multi-fields.md)-`keyword`
+      field with an `ignore_above` of `256`.  String fields longer than 256 characters are available for full text
+      search but won't have a value in their `.keyword` sub-field they can not do exact matching over _search.


I agree I am a bit confused by the very last sentence in this paragraph.
@bmorelli25 I like your suggested rewrite, but I believe it should be "256 characters" not bytes

limotova · 2025-06-22T04:37:41Z

docs/reference/elasticsearch/mapping-reference/keyword.md

+:   Do not index any field containing a string with more characters than this value. This is important because {{es}}
+    will reject entire documents if they contain keyword fields that exceed `32766` bytes when UTF-8 encoded.
+
+    To avoid any risk of document rejection, set this value to `8191` or less. Fields with strings exceeding this
+    length will be excluded from indexing.


Does this work on text fields? Or only keyword fields?

This setting is only available on keyword fields. But on text fields some tokenizers can have a max_token_length setting which doesn't ignore but instead splits tokens that exceed this length (so quite a bit different)

What about this:

I think it might be a bit clearer to specify characters/bytes, like "UTF-8–encoded size of 32766 bytes." and "set this value to 8191 characters or less."

github-actions · 2025-06-23T19:25:42Z

🔍 Preview links for changed docs:

docs/reference/elasticsearch/mapping-reference/keyword.md

🔔 The preview site may take up to 3 minutes to finish building. These links will become live once it completes.

nik9000 · 2025-06-23T19:26:14Z

Thanks folks! I updated the wording and used a table. I like the table!

limotova

Left a couple of nitpicks for sentences i had to reread a couple of times to understand but overall LGTM!

docs/reference/elasticsearch/mapping-reference/keyword.md

leemthompo · 2025-07-09T13:06:23Z

@nik9000 just happened on this PR again randomly, not sure if it fell off radar :)

Co-authored-by: Larisa Motova <[email protected]>

github-actions · 2025-07-17T19:30:58Z

🔍 Preview links for changed docs

docs/reference/elasticsearch/mapping-reference/keyword.md

Explain ignore_above better

5d6fc66

This concept is complicated. Closes elastic#128991

nik9000 requested a review from limotova June 11, 2025 18:42

nik9000 added >docs General docs changes :Search Foundations/Mapping Index mappings, including merging and defining field types v9.1.0 labels Jun 11, 2025

elasticsearchmachine added Team:Docs Meta label for docs team Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch labels Jun 11, 2025

nik9000 added 2 commits June 12, 2025 08:28

Format

c1c509d

Explain more

6f9f2bb

leemthompo reviewed Jun 12, 2025

View reviewed changes

docs/reference/elasticsearch/mapping-reference/keyword.md Outdated Show resolved Hide resolved

leemthompo reviewed Jun 12, 2025

View reviewed changes

docs/reference/elasticsearch/mapping-reference/keyword.md Outdated Show resolved Hide resolved

leemthompo reviewed Jun 12, 2025

View reviewed changes

nik9000 and others added 2 commits June 12, 2025 13:38

Apply suggestions from code review

c3c7206

Co-authored-by: Liam Thompson <[email protected]>

More

eaefb8d

bmorelli25 reviewed Jun 13, 2025

View reviewed changes

limotova reviewed Jun 22, 2025

View reviewed changes

nik9000 added 2 commits June 23, 2025 15:06

Merge branch 'main' into esql_explain_ignore_above

36f91e8

Better words

0806dbd

nik9000 requested review from bmorelli25, leemthompo and limotova June 23, 2025 19:25

limotova approved these changes Jun 24, 2025

View reviewed changes

docs/reference/elasticsearch/mapping-reference/keyword.md Outdated Show resolved Hide resolved

docs/reference/elasticsearch/mapping-reference/keyword.md Outdated Show resolved Hide resolved

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

Apply suggestions from code review

3974e6c

Co-authored-by: Larisa Motova <[email protected]>

nik9000 enabled auto-merge (squash) July 17, 2025 19:25

Merge branch 'main' into esql_explain_ignore_above

89654f0

nik9000 merged commit 6ed50e1 into elastic:main Jul 17, 2025
10 checks passed

Index type	Default	Effect
Standard indices	`2147483647` (effectively unbounded)	Documents will be rejected if any keyword exceeds `32766` bytes.
`logsdb` indices	`8191`	Documents are never rejected. Keywords exceding this limit are still kept in `_source`, but won’t be searchable or aggregatable.

Explain ignore_above better #129284

Explain ignore_above better #129284

Uh oh!

Conversation

nik9000 commented Jun 11, 2025

Uh oh!

elasticsearchmachine commented Jun 11, 2025

Uh oh!

elasticsearchmachine commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

leemthompo Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bmorelli25 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bmorelli25 Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

limotova left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

nik9000 commented Jun 23, 2025

Uh oh!

limotova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

leemthompo commented Jul 9, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

🔍 Preview links for changed docs

Uh oh!

Uh oh!

Uh oh!

Explain `ignore_above` better #129284

Explain `ignore_above` better #129284

leemthompo Jun 12, 2025 •

edited

Loading

bmorelli25 Jun 13, 2025 •

edited

Loading