end_offset of lingature charactor is wrong when using icu_normalizer

## Summary

When using the `icu_normalizer`, the `end_offset` of the token is incorrect when the character is a ligature.

## Environment

- OpenSearch version: 2.15.0
- elasticsearch-sudachi version: 3.2.3

## Steps to reproduce

`POST /sample`

```json
{
  "settings": {
    "index": {
      "analysis": {
        "char_filter": {
          "normalize": {
            "type": "icu_normalizer",
            "name": "nfkc",
            "mode": "compose"
          }
        },
        "filter": {
          "sudachi_split_filter": {
            "type": "sudachi_split",
            "mode": "search"
          }
        },
        "analyzer": {
          "default": {
            "type": "custom",
            "char_filter": [
              "normalize"
            ],
            "tokenizer": "sudachi_tokenizer",
            "filter": [
              "sudachi_split_filter"
            ]
          }
        }
      }
    }
  }
}
```

`POST /sample/_analyze`

```json
{
  "analyzer": "default",
  "text": "㍿"
}
```

## Expected behavior

I think `tokens[0].end_offset` should be 4.

```json
{
  "tokens": [
    {
      "token": "株式会社",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "株式",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "会社",
      "start_offset": 2,
      "end_offset": 4,
      "type": "word",
      "position": 1
    }
  ]
}
```

## Actual behavior

`tokens[0].end_offset` is 1.
the behavior of mode A is correct.

```json
{
  "tokens": [
    {
      "token": "株式会社",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0,
      "positionLength": 2
    },
    {
      "token": "株式",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "会社",
      "start_offset": 2,
      "end_offset": 4,
      "type": "word",
      "position": 1
    }
  ]
}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

end_offset of lingature charactor is wrong when using icu_normalizer #148

Summary

Environment

Steps to reproduce

Expected behavior

Actual behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

end_offset of lingature charactor is wrong when using icu_normalizer #148

Description

Summary

Environment

Steps to reproduce

Expected behavior

Actual behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions