Skip to content

Commit

Permalink
fix: change order of JSON Schema to search mapper transformations (#32)
Browse files Browse the repository at this point in the history
* fix: change order of JSON Schema to search mapper transformations

In the JSON Schema to search mapper, the suppression flags need to be addressed first,
otherwise the jsonref.replace_refs function may remove them.

Signed-off-by: Cesar Berrospi Ramis <[email protected]>

* build: update dependencies

Updating pydantic from 2.8.2 to 2.9.2 triggers a change in JSON Schema from models:
the  lists with only 1 element get flatten.

Signed-off-by: Cesar Berrospi Ramis <[email protected]>

* chore: improve verbose in JSON Schema to search mapper test

Signed-off-by: Cesar Berrospi Ramis <[email protected]>

---------

Signed-off-by: Cesar Berrospi Ramis <[email protected]>
  • Loading branch information
ceberam authored Sep 26, 2024
1 parent b49e93e commit a4ddd14
Show file tree
Hide file tree
Showing 7 changed files with 597 additions and 498 deletions.
10 changes: 6 additions & 4 deletions docling_core/search/json_schema_to_search_mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from copy import deepcopy
from typing import Any, Optional, Pattern, Tuple, TypedDict

from jsonref import JsonRef
from jsonref import replace_refs


class SearchIndexDefinition(TypedDict):
Expand Down Expand Up @@ -95,7 +95,11 @@ def get_index_definition(self, schema: dict) -> SearchIndexDefinition:
which define the fields, their data types, and other specifications to index
JSON documents into a Lucene index.
"""
mapping = JsonRef.replace_refs(schema)
mapping = deepcopy(schema)

mapping = self._suppress(mapping, self._suppress_key)

mapping = replace_refs(mapping)

mapping = self._merge_unions(mapping)

Expand All @@ -105,8 +109,6 @@ def get_index_definition(self, schema: dict) -> SearchIndexDefinition:

mapping = self._remove_keys(mapping, self._rm_keys)

mapping = self._suppress(mapping, self._suppress_key)

mapping = self._translate_keys_re(mapping)

mapping = self._clean(mapping)
Expand Down
6 changes: 1 addition & 5 deletions docs/Document.json
Original file line number Diff line number Diff line change
Expand Up @@ -323,11 +323,7 @@
"type": "string"
},
"bounding_box": {
"allOf": [
{
"$ref": "#/$defs/BoundingBoxContainer"
}
],
"$ref": "#/$defs/BoundingBoxContainer",
"x-es-suppress": true
},
"prov": {
Expand Down
2 changes: 1 addition & 1 deletion docs/Document.md
Original file line number Diff line number Diff line change
Expand Up @@ -6052,7 +6052,7 @@ Must be one of:
| **Type** | `object` |
| **Required** | Yes |
| **Additional properties** | [[Any type: allowed]](# "Additional Properties of any type are allowed.") |
| **Defined in** | |
| **Defined in** | #/$defs/BoundingBoxContainer |

**Description:** Bounding box container.

Expand Down
6 changes: 1 addition & 5 deletions docs/Generic.json
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,7 @@
"x-es-type": "text"
},
"file-info": {
"allOf": [
{
"$ref": "#/$defs/FileInfoObject"
}
],
"$ref": "#/$defs/FileInfoObject",
"description": "Minimal identification information of the document within a collection.",
"title": "Document information"
}
Expand Down
2 changes: 1 addition & 1 deletion docs/Generic.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
| **Type** | `object` |
| **Required** | Yes |
| **Additional properties** | [[Any type: allowed]](# "Additional Properties of any type are allowed.") |
| **Defined in** | |
| **Defined in** | #/$defs/FileInfoObject |

**Description:** Minimal identification information of the document within a collection.

Expand Down
1,055 changes: 579 additions & 476 deletions poetry.lock

Large diffs are not rendered by default.

14 changes: 8 additions & 6 deletions test/test_json_schema_to_search_mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,10 @@ def test_json_schema_to_search_mapper_0():
index_ref = _load(filename)

diff = jsondiff.diff(index_ref, index_def)
print(json.dumps(index_def, indent=2))
print(diff)
assert index_def == index_ref
# print(json.dumps(index_def, indent=2))
assert (
index_def == index_ref
), f"Error in search mappings of ExportedCCSDocument. Difference:\n{json.dumps(diff, indent=2)}"


def test_json_schema_to_search_mapper_1():
Expand Down Expand Up @@ -99,6 +100,7 @@ def test_json_schema_to_search_mapper_1():
index_ref = _load(filename)

diff = jsondiff.diff(index_ref, index_def)
# print(json.dumps(index_def,indent=2))
print(diff)
assert index_def == index_ref
# print(json.dumps(index_def, indent=2))
assert (
index_def == index_ref
), f"Error in search mappings of Record. Difference:\n{json.dumps(diff, indent=2)}"

0 comments on commit a4ddd14

Please sign in to comment.