Skip to content

DAS extraction issues with new page #1221

@lfoppiano

Description

@lfoppiano

In this issue we further collect DAS extraction issues which seems to be hard to cure with more training data. They happen in Plos articles and it's due to the page break together with the double column, we could check whether we can fix the segmentation parser at feature level.

journal.pwat.0000127.pdf

image
<div type="availability">
                <div
                    xmlns="http://www.tei-c.org/ns/1.0">
                    <p>The original contributions presented in the study are publicly available. The data can be found at the following: Service provider data can be accessed at: 
                        <ref type="url" target="https://database.ib-net.org/countries_results?ctry=">https:// database.ib-net.org/countries_results?ctry=</ref> 29&amp;years=2018&amp;type=report&amp;ent=country&amp;mult= true&amp;report=1&amp;table=true&amp;chart= false&amp;chartType=column&amp;lang=EN&amp;exch=1. 2018 DHS data is available on application via the public repository: 
                        <ref type="url" target="https://dhsprogram.com/methodology/survey/survey-display-542.cfm">https://dhsprogram.com/methodology/ survey/survey-display-542.cfm</ref>. Regulator data is openly available and can be extracted from the
                    </p>
                </div>
            </div>

Metadata

Metadata

Assignees

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions