Skip to content

Fix the TopSectionTitle being split in MSFT filing #63

@Elijas

Description

@Elijas

Context

MSFT accuracy-test (permalink at the time of posting)

Problem

Titles come out as two separate title elements

        {
            "text_content": "PART I. FINANCI"
        },
        {
            "text_content": "AL INFORMATION"
        },

This is because MSFT puts the section titles into two pieces for some reason

Ideas about a possible solution

Maybe include the line information into the solution: If two elements of the same type (and level) are on the same line, they should probably be identified as a single element

Metadata

Metadata

Assignees

No one assigned

    Labels

    contributions-welcomeIntended for completion by you, the contributorfeature:elementsParsing all the other elements correctly

    Type

    No type

    Projects

    Status

    Medium

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions