Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve doc item typing #105

Merged
merged 4 commits into from
Dec 13, 2024
Merged

fix: improve doc item typing #105

merged 4 commits into from
Dec 13, 2024

Conversation

vagenas
Copy link
Collaborator

@vagenas vagenas commented Dec 12, 2024

No description provided.

Signed-off-by: Panos Vagenas <[email protected]>
Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot the FormItem in the ContentItem

Signed-off-by: Panos Vagenas <[email protected]>
label: typing.Literal[DocItemLabel.SECTION_HEADER] = (
DocItemLabel.SECTION_HEADER # type: ignore[assignment]
)
level: LevelNumber = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the reason this works with serialization and deserialization, despite setting a level default, is becuase the label is now non-overlapping to the label literals in TextItem? If yes, that's great.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, label is now non-overlapping — and is actually used as the discriminator field in ContentItem further below.

label: typing.Literal[DocItemLabel.KEY_VALUE_REGION] = DocItemLabel.KEY_VALUE_REGION


class FormItem(DocItem):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need a FormItem at this point. We can delay putting this in up until we will use it.
The changes for the layout processing in docling-project/docling#530 currently put simply a GroupItem for Forms and Key-Value-Regions, which act purely as groups without special semantics.

Copy link

mergify bot commented Dec 12, 2024

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

🟢 Require two reviewer for test updates

Wonderful, this rule succeeded.

When test data is updated, we require two reviewers

  • #approved-reviews-by >= 2

Signed-off-by: Panos Vagenas <[email protected]>
Copy link
Contributor

@dolfim-ibm dolfim-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we easily enforce the type also in the add_text() method?

@vagenas
Copy link
Collaborator Author

vagenas commented Dec 13, 2024

Can we easily enforce the type also in the add_text() method?

Technically it will now be "enforced" when the TextItem is created within that method.

If you mean in terms of reflecting it to the typing of the label param: if we invest more efforts in that area, I would rather do it in a way that reuses the type validation of the various DocItems, e.g. have a single add_node(), but I see that as requiring some redesign, and in general outside the scope of this PR.

Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, let's get it merged!

@vagenas vagenas merged commit 047a196 into main Dec 13, 2024
8 checks passed
@vagenas vagenas deleted the improve-doc-item-typing branch December 13, 2024 11:51
muhark pushed a commit to muhark/docling-core that referenced this pull request Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants