-
Notifications
You must be signed in to change notification settings - Fork 287
temporary PR for Dev #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
temporary PR for Dev #782
Conversation
Removing broken information in brackets
* implement selection of construct validation tests * rename test report file of construct validation tests * fix path to test reports directory * add removing brackets that start with "(," * make removing broken brackets in WikipediaNifExtractor configurable * rename abstract extractors * fix getting short abstracts in WikipediaNifExtractor * generate two dumps for plain and html abstract extractors * fix minidump abstract extraction
* add Construct model * add construct validation test for (; in abstracts * implement handling of right and left validators
* Fix merging of multiple infoboxes under one resource, see issue #709 * Add new page to the minidump for testing * Create SHACL test, add it to the shacl-test-groups.csv and disabled isbn test
closes #771. --------- Co-authored-by: Ted Thibodeau Jr <[email protected]>
|
TallTed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Editorial. Minor grammar, punctuation, etc.
| > | ||
| > Therefore, you can use this tool to extract an example page and check if the error persists in the latest software version, and add the link you used for verification, e.g., http://dief.tools.dbpedia.org/server/extraction/en/extract?title=United+States | ||
| # Issue validity | ||
| > Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/ | |
| > Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into <http://dbpedia.org/sparql>. During these three months, Wikipedia changes and the DBpedia Information Extraction Framework receive patches. At <http://dief.tools.dbpedia.org/server/extraction/en/>, we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g., `Berlin` or `Joe_Biden` here: <http://dief.tools.dbpedia.org/server/extraction/en/>. |
| > Where did you find the data issue? Non-exhaustive options are: | ||
| * Web/SPARQL, e.g. http://dbpedia.org/sparql or http://dbpedia.org/resource/Berlin, please **provide query or link** | ||
| * Dumps: dumps are managed by the Databus. Please **provide artifact & version or download link** | ||
| * DIEF: you ran the software and the error occured then, please **include all necessary information such as the extractor or log**. If you had problems running the software use [another issue template](https://github.com/dbpedia/extraction-framework/issues/new/choose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| > Where did you find the data issue? Non-exhaustive options are: | |
| * Web/SPARQL, e.g. http://dbpedia.org/sparql or http://dbpedia.org/resource/Berlin, please **provide query or link** | |
| * Dumps: dumps are managed by the Databus. Please **provide artifact & version or download link** | |
| * DIEF: you ran the software and the error occured then, please **include all necessary information such as the extractor or log**. If you had problems running the software use [another issue template](https://github.com/dbpedia/extraction-framework/issues/new/choose) | |
| > Where did you find the data issue? Non-exhaustive options include the following: | |
| * Web/SPARQL, e.g., <http://dbpedia.org/sparql> or <http://dbpedia.org/resource/Berlin>, please **provide query or link.** | |
| * Dumps: dumps are managed by the Databus. Please **provide artifact & version or download link.** | |
| * DIEF: you ran the software and the error occured then. Please **include all necessary information such as the extractor or log.** If you had problems running the software use [another issue template](https://github.com/dbpedia/extraction-framework/issues/new/choose). |
| >Example DBpedia resource URL(s) | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| >Example DBpedia resource URL(s) | |
| ``` | |
| > Example DBpedia resource URL(s) | |
| ``` |
| ### Additional context | ||
| > Add any other context about the problem here. | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ``` | |
| ``` |
| **Documentation**: http://dev.dbpedia.org/Extraction <br/> | ||
| **Get in touch with DBpedia**: https://wiki.dbpedia.org/join/get-in-touch <br/> | ||
| **Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace](https://dbpedia-slack.herokuapp.com/) - the main point for [developement updates](https://github.com/dbpedia/extraction-framework/blob/master/.github/workflows/maven.yml) and discussions <br/> | ||
| **Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| **Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/> | |
| **Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) Slack channel within the the [DBpedia Slack workspace](https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) — the main point for development updates and discussions <br/> |
| 6. When you are done with a bugfix or feature, `rebase` your branch onto `extraction-framework/dev` (`git pull --rebase git://github.com/dbpedia/extraction-framework.git`). Resolve possible conflicts and commit. | ||
| 7. Push your branch to GitHub (`git push origin fixRestApiParams`). | ||
| 8. Send a pull request from your branch into `extraction-framework/dev` via GitHub. | ||
| * In the description, reference the associated commit (for example, _"Fixes #123 by ..."_ for issue number 123). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * In the description, reference the associated commit (for example, _"Fixes #123 by ..."_ for issue number 123). | |
| * In the description, reference the associated issue (for example, _"Fixes #123 by ..."_ for issue number 123). |
| * In the description, reference the associated commit (for example, _"Fixes #123 by ..."_ for issue number 123). | ||
| * Your changes will be reviewed and discussed on GitHub. | ||
| * In addition, [Travis-CI](http://about.travis-ci.org/) will test if the merged version passes the build. | ||
| * In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build. | |
| * In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test whether the merged version passes the build. |
|
|
||
| Please keep in mind: | ||
| - Try *not* to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made. | ||
| - **Never** rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - **Never** rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch! | |
| - **Never** rebase the master onto a development branch (i.e., _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch! |
| //if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different | ||
| //entity and need to be saved as new resource. | ||
| //This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| //if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different | |
| //entity and need to be saved as new resource. | |
| //This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each | |
| //if we have more than one info box and name property of infobox is different than title of page, then infobox belongs to a different | |
| //entity and needs to be saved as a new resource. | |
| //This is to avoid assigning one entity's properties to another entity, because a single page can have multiple info boxes and each is |
| # | ||
| #log-dir= see: ../core/src/main/resources/universal.properties | ||
|
|
||
| # WikiPages failed to extract in the first try can be retried with this option (especially interesting when extraction from the mediawiki api) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # WikiPages failed to extract in the first try can be retried with this option (especially interesting when extraction from the mediawiki api) | |
| # WikiPages failed to extract in the first try can be retried with this option (especially interesting when extracting from the mediawiki api) |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughRenames and reconfigures abstract extractors (AbstractExtractor → PlainAbstractExtractor; AbstractExtractorWikipedia → HtmlAbstractExtractor), adds configurable broken-brackets cleaning to abstracts (core and NIF), refactors construct validation to use Construct and ValidatorGroup across validators and test suites, updates configs, tests, and docs, and introduces new utilities and SHACL tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Config
participant E as Extractor (Html/Plain/NIF)
participant M as MediaWikiConnector
participant U as AbstractUtils
participant W as DatasetWriter
C->>E: init with removeBrokenBracketsProperty
E->>M: fetch page HTML/text
M-->>E: raw abstract text
alt removeBrokenBracketsProperty = true
E->>U: removeBrokenBracketsInAbstracts(text)
U-->>E: cleanedText
E->>W: write quads (cleanedText)
else
E->>W: write quads (raw text)
end
sequenceDiagram
autonumber
participant G as NTripleTestGenerator
participant S as NTripleTestSuite
participant V as Validator (Pattern/Typed/Vocab/NotContains)
participant C as Construct
G->>G: parse SPARQL results (validatorGroup)
G-->>S: validators with VALIDATOR_GROUP
S->>S: parse N-Triples line
S-->>S: Array(Construct(s), Construct(p, s,o), Construct(o))
loop for each Construct
S->>V: run(Construct)
alt VALIDATOR_GROUP = RIGHT
V->>V: use Construct.right
else VALIDATOR_GROUP = LEFT
V->>V: use Construct.left
else DEFAULT
V->>V: use Construct.self
end
V-->>S: Boolean
end
S-->>S: aggregate scores
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala (1)
14-15: Trim input; fix iri typoTrim before matching to avoid whitespace issues. Also fix the iri string typo.
- override val iri: ValidatorIRI = "#GENERIC_LTIERAL_LANG_TAG_VALIDATOR" + override val iri: ValidatorIRI = "#GENERIC_LITERAL_LANG_TAG_VALIDATOR" @@ - pattern.matcher(nTriplePart.self).matches() + pattern.matcher(nTriplePart.self.trim).matches()Also applies to: 18-21
dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala (1)
284-289: Propagate validatorGroup to doesNotContains validators
v:validatorGroupon IRI validators is ignored here because we still callNotContainsValidatorwith the legacy(id, iri, charSeq)signature. That forces every validator back toDEFAULT, even when the TTL assignsleftValidator/rightValidator, so side-specific constraints can’t be expressed. Please thread the group through like the other constructors.- validatorQuerySolution.getLiteral("doesNotContains").getLexicalForm.split(delim).foreach(charSeq => { - - validatorCollection.append(NotContainsValidator(currentValidatorID, validatorIRI, charSeq)) + validatorQuerySolution.getLiteral("doesNotContains").getLexicalForm.split(delim).foreach(charSeq => { + val validatorGroup = getValidatorGroup(validatorQuerySolution) + validatorCollection.append(NotContainsValidator(currentValidatorID, validatorIRI, charSeq, validatorGroup)) groupedValidators.append(currentValidatorID) currentValidatorID += 1 })
♻️ Duplicate comments (2)
README.md (1)
7-7: Fix typo: "developement" → "development".Line 7 contains a typo in the Slack channel description.
Apply this diff:
-**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/> +**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for development updates and discussions <br/>dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties (1)
13-13: Address the grammatical error in the comment.As noted in a previous review, the comment has a grammatical issue. Consider applying the suggested fix.
🧹 Nitpick comments (9)
dump/src/test/resources/shaclTestsCoverageTable.md (1)
29-30: Consider fixing markdown table formatting.The functional changes (adding
#en_abstract_validationtags) are correct and align with the new SHACL validation rules. However, the markdown table has formatting issues: many rows have only 2 cells but the table header defines 4 columns (wikipage-uri, shacl-test, issue, comment).To improve readability, consider either:
- Adding empty cells for missing columns:
| | |at the end- Or restructuring the table to match the actual data
Example for line 29:
-[http://de.dbpedia.org/resource/Arthur_Schopenhauer](...) | [http://dbpedia.org/ontology/abstract](...) #en_abstract_validation | +[http://de.dbpedia.org/resource/Arthur_Schopenhauer](...) | [http://dbpedia.org/ontology/abstract](...) #en_abstract_validation | |Based on static analysis hints.
Also applies to: 32-32, 43-43, 54-54, 64-64, 72-72, 83-83, 94-95, 104-104, 114-114, 125-125, 127-127, 131-134, 139-139, 148-148, 151-152, 159-159, 169-170, 172-172, 216-217, 229-229
core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala (1)
98-104: Fix code style inconsistencies.There are several code style issues in the infobox name extraction logic.
Apply this diff:
- var name = subjectUri; + var name = subjectUri if(allNames.size > 0) name = allNames(0).propertyNodeValueToPlainText //getting subject of wikipedia page - var splittedURI = subjectUri.split("/") - var pageTitle = splittedURI(splittedURI.size - 1) + val splitUri = subjectUri.split("/") + val pageTitle = splitUri(splitUri.size - 1)Changes:
- Remove unnecessary semicolon (line 98)
- Fix typo: "splittedURI" → "splitUri"
- Use
valinstead ofvarfor immutable values (lines 103-104)core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala (1)
75-81: LGTM: Conditional cleaning logic is correct.The implementation correctly applies
AbstractUtils.removeBrokenBracketsInAbstractsto both long and short abstracts when the feature is enabled, falling back to unmodified text otherwise.Consider adding a comment explaining what "broken brackets" are (e.g., unclosed or mismatched bracket patterns) to improve code documentation:
+ // Clean malformed or unclosed bracket patterns (e.g., "text [[link") from abstracts when enabled val (cleanLongAbstract, cleanShortAbstract) = if (removeBrokenBrackets) {core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala (1)
1-30: Consider adding edge case tests.The current test coverage is good for common scenarios. However, consider adding tests for:
- Nested brackets:
"Text (outer (inner)) more text"- Unmatched brackets:
"Text (unclosed" or "Text closed)"- Multiple consecutive broken brackets:
"Text () () () more"- Mixed valid and broken brackets:
"Text (valid content) (; broken)"These edge cases would help ensure robust handling of malformed Wikipedia abstracts.
README.md (1)
100-109: Fix list indentation for consistency.Lines 100, 101, and 108 have incorrect indentation for list items (2 spaces instead of 0). While this doesn't break functionality, consistent indentation improves readability.
Based on static analysis hints.
core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala (2)
25-25: Add spacing around the+=operator.For consistency with Scala style conventions, add spaces around the
+=operator.Apply this diff:
- if (bracketsWithSemicolon != 0) { - closeBrackets+=1 - } + if (bracketsWithSemicolon != 0) { + closeBrackets += 1 + }
9-44: Consider adding inline comments for complex bracket-tracking logic.The nested bracket tracking using three counters (
bracketsWithSemicolon,closeBrackets,skipBrackets) is non-trivial. Adding brief inline comments explaining the state transitions would improve maintainability.dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala (1)
16-33: Normalize and simplify group routing in run()Trim the selected value and avoid nested matches; use explicit DEFAULT for future-proofing.
- override def run(nTriplePart: Construct): Boolean = { - VALIDATOR_GROUP match { - case ValidatorGroup.RIGHT => nTriplePart.right match { - // TODO: maybe we need to rename "value" - case Some(value) => vocab.contains(value) - case None => false - } - case ValidatorGroup.LEFT => nTriplePart.left match { - case Some(value) => vocab.contains(value) - case None => false - } - case _ => vocab.contains(nTriplePart.self) - } - } + override def run(nTriplePart: Construct): Boolean = { + val selected: Option[String] = VALIDATOR_GROUP match { + case ValidatorGroup.RIGHT => nTriplePart.right + case ValidatorGroup.LEFT => nTriplePart.left + case ValidatorGroup.DEFAULT => Some(nTriplePart.self) + } + selected.exists(v => vocab.contains(v.trim)) + }dump/src/main/scala/org/dbpedia/validation/construct/model/validators/NotContainsValidator.scala (1)
10-22: Simplify group handling and avoid nested matchesUse Option flow with explicit DEFAULT; keeps semantics and reduces branching.
- override def run(nTriplePart: Construct): Boolean = { - VALIDATOR_GROUP match { - case ValidatorGroup.RIGHT => nTriplePart.right match { - // TODO: maybe we need to rename "value" - case Some(value) => !value.contains(sequence) - case None => false - } - case ValidatorGroup.LEFT => nTriplePart.left match { - case Some(value) => !value.contains(sequence) - case None => false - } - case _ => !nTriplePart.self.contains(sequence) - } - } + override def run(nTriplePart: Construct): Boolean = { + val selected: Option[String] = VALIDATOR_GROUP match { + case ValidatorGroup.RIGHT => nTriplePart.right + case ValidatorGroup.LEFT => nTriplePart.left + case ValidatorGroup.DEFAULT => Some(nTriplePart.self) + } + selected.exists(v => !v.contains(sequence)) + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (89)
dump/src/test/resources/cv-test-groups.csvis excluded by!**/*.csvdump/src/test/resources/minidumps/af/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/als/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/an/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ar/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/arz/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ast/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/az/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/azb/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ba/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/bar/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/be/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/bg/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/bn/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/br/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/bs/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ca/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ceb/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ckb/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/commons/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/cs/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/cy/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/da/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/de/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/el/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/en/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/eo/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/es/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/et/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/eu/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/fa/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/fi/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/fr/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/fy/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ga/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/gd/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/gl/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/he/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/hr/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/hu/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/hy/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/id/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/is/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/it/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ja/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ka/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/kn/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ko/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ku/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ky/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/la/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/lb/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/lt/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/lv/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/mk/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ml/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/mn/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ms/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/nds/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/nl/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/nn/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/no/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/os/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/pl/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/pt/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ro/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ru/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/sco/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/sh/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/si/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/simple/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/sl/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/sq/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/sr/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/sv/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ta/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/te/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/tg/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/th/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/tl/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/tr/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/tt/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/uk/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/ur/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/vec/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/vi/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/war/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/minidumps/zh/wiki.xml.bz2is excluded by!**/*.bz2dump/src/test/resources/shacl-test-groups.csvis excluded by!**/*.csv
📒 Files selected for processing (48)
.github/ISSUE_TEMPLATE/data.md(1 hunks)README.md(4 hunks)core/doc/HowTo-release-DBpedia.txt(1 hunks)core/src/main/scala/org/dbpedia/extraction/config/Config.scala(4 hunks)core/src/main/scala/org/dbpedia/extraction/mappings/HtmlAbstractExtractor.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/mappings/NifExtractor.scala(2 hunks)core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala(8 hunks)core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala(4 hunks)core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala(1 hunks)core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala(1 hunks)core/src/test/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractorTest.scala(2 hunks)core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala(1 hunks)dump/extraction.abstracts.properties(1 hunks)dump/pom.xml(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/NotContainsValidator.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/PatternValidator.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala(2 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericIRIValidator.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala(2 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralValidator.scala(2 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericRdfLangStringValidator.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericValidator.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/tests/TestSuiteFactory.scala(1 hunks)dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala(6 hunks)dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala(2 hunks)dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala(4 hunks)dump/src/test/bash/minidump-overview.md(1 hunks)dump/src/test/bash/uris.lst(1 hunks)dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl(11 hunks)dump/src/test/resources/ci-tests/xsd_ci-tests.ttl(1 hunks)dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties(1 hunks)dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties(1 hunks)dump/src/test/resources/shacl-tests/instances/Marian_Breland_Bailey.ttl(1 hunks)dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl(1 hunks)dump/src/test/resources/shaclTestsCoverageTable.md(12 hunks)dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala(2 hunks)dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala(2 hunks)dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala(2 hunks)dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala(1 hunks)dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala(1 hunks)live/live.default.xml(1 hunks)server/server.default.properties(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
README.md
107-107: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
108-108: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
109-109: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
112-112: Emphasis style
Expected: underscore; Actual: asterisk
(MD049, emphasis-style)
112-112: Emphasis style
Expected: underscore; Actual: asterisk
(MD049, emphasis-style)
113-113: Emphasis style
Expected: underscore; Actual: asterisk
(MD049, emphasis-style)
113-113: Emphasis style
Expected: underscore; Actual: asterisk
(MD049, emphasis-style)
125-125: Bare URL used
(MD034, no-bare-urls)
dump/src/test/resources/shaclTestsCoverageTable.md
29-29: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
29-29: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
30-30: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
30-30: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
31-31: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
31-31: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
32-32: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
32-32: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
43-43: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
43-43: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
54-54: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
54-54: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
64-64: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
64-64: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
65-65: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
65-65: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
66-66: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
66-66: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
67-67: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
67-67: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
68-68: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
68-68: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
69-69: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
69-69: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
70-70: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
70-70: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
71-71: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
72-72: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
72-72: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
83-83: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
83-83: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
94-94: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
94-94: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
95-95: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
95-95: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
104-104: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
104-104: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
114-114: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
114-114: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
125-125: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
125-125: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
126-126: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
126-126: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
127-127: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
127-127: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
128-128: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
128-128: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
129-129: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
129-129: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
130-130: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
131-131: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
131-131: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
132-132: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
132-132: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
133-133: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
133-133: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
134-134: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
134-134: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
135-135: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
135-135: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
136-136: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
136-136: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
137-137: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
137-137: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
138-138: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
138-138: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
139-139: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
139-139: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
148-148: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
148-148: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
149-149: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
149-149: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
150-150: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
150-150: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
151-151: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
151-151: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
152-152: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
152-152: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
153-153: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
153-153: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
154-154: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
154-154: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
155-155: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
155-155: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
156-156: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
157-157: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
157-157: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
158-158: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
158-158: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
159-159: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
159-159: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
160-160: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
160-160: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
161-161: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
161-161: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
162-162: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
162-162: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
163-163: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
163-163: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
164-164: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
164-164: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
165-165: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
165-165: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
166-166: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
166-166: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
167-167: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
167-167: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
168-168: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
168-168: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
169-169: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
169-169: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
170-170: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
170-170: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
171-171: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
172-172: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
172-172: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
216-216: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
216-216: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
217-217: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
217-217: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
229-229: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe
(MD055, table-pipe-style)
229-229: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data
(MD056, table-column-count)
dump/src/test/bash/minidump-overview.md
48-48: Bare URL used
(MD034, no-bare-urls)
.github/ISSUE_TEMPLATE/data.md
11-11: Bare URL used
(MD034, no-bare-urls)
11-11: Bare URL used
(MD034, no-bare-urls)
11-11: Bare URL used
(MD034, no-bare-urls)
19-19: Bare URL used
(MD034, no-bare-urls)
19-19: Bare URL used
(MD034, no-bare-urls)
25-25: Blank line inside blockquote
(MD028, no-blanks-blockquote)
27-27: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (47)
dump/src/test/resources/ci-tests/xsd_ci-tests.ttl (1)
55-55: Formatting adjustments look good.Whitespace cleanup only; TTL semantics unaffected.
Also applies to: 60-60, 68-68, 74-74, 76-76, 84-84, 90-90
dump/src/test/bash/uris.lst (1)
44-44: Verify the test coverage purpose for this URL.The addition of the Marian Breland Bailey Wikipedia URL appears valid. However, please confirm whether this URL was specifically chosen to test an edge case related to the PR's changes (e.g., broken brackets in abstracts, HTML extraction issues, or other scenarios).
If this URL tests a specific functionality, consider documenting the test purpose in related test files or comments to aid future maintenance.
dump/extraction.abstracts.properties (1)
32-32: LGTM! Configuration aligns with extractor refactoring.The switch from
AbstractExtractortoPlainAbstractExtractoris consistent with the broader refactoring across the codebase where extractors are being renamed and reorganized.core/src/main/scala/org/dbpedia/extraction/mappings/HtmlAbstractExtractor.scala (1)
16-25: LGTM! Class rename improves clarity.The rename from
AbstractExtractorWikipediatoHtmlAbstractExtractorbetter describes the extractor's purpose and aligns with the parallelPlainAbstractExtractornaming.dump/src/test/resources/shacl-tests/instances/Marian_Breland_Bailey.ttl (1)
1-21: LGTM! Well-formed SHACL test resource.The SHACL NodeShape definition is syntactically correct and follows standard patterns for property validation testing. The structure aligns with other SHACL test resources in this PR.
dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl (1)
12-19: LGTM! Validation rule correctly detects broken brackets.The SHACL pattern constraint uses a negative lookahead regex to reject abstracts containing the broken bracket pattern
(;. This aligns well with the newremoveBrokenBracketsInAbstractsutility introduced in the broader PR.core/src/main/scala/org/dbpedia/extraction/config/Config.scala (3)
280-282: LGTM! Configuration properly extended for broken brackets cleanup.The addition of
removeBrokenBracketsPropertytoAbstractParametersis well-structured and uses an appropriate property key (remove-broken-brackets-plain-abstracts) to distinguish it from the HTML variant.
297-299: LGTM! NIF parameters consistently extended.The
removeBrokenBracketsPropertyfield is consistently added toNifParameterswith an appropriate property key (remove-broken-brackets-html-abstracts) that distinguishes HTML abstract processing from plain text processing.
353-355: LGTM! Case class definitions properly updated.Both case class definitions are correctly updated with the new
removeBrokenBracketsProperty: Booleanfield, maintaining structural consistency across the configuration classes.Also applies to: 374-381
core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala (4)
35-136: LGTM! Refactoring improves code structure.The decomposition of the
extractmethod with explicit condition variables (condition1_create_correspondingproperty,condition2_template_exists, etc.) and extracted helper methods significantly improves readability and maintainability.
138-159: LGTM! Helper method well-structured.The
createMissingTypesmethod cleanly encapsulates the logic for computing and emitting missing type triples when extending an existing resource with new types.
161-180: LGTM! Instance creation logic well-extracted.The
createInstancemethod appropriately handles type statement creation with proper dataset splitting between direct types and transitive types.
189-216: LGTM! URI generation logic clearly defined.The
generateUrimethod provides a clear strategy for generating URIs: prefer name-like properties, fall back to the first property, or use the template title as a last resort.dump/src/test/bash/minidump-overview.md (1)
48-48: LGTM! New article added to test coverage.The addition of the Marian_Breland_Bailey Wikipedia article aligns with the new SHACL test resources introduced in this PR and expands test coverage appropriately.
core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala (1)
59-62: Post-processing now references PlainAbstractExtractor correctly.
Matches the production path and keeps the test aligned with the renamed extractor.dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala (2)
68-76: LGTM: Query expansion correctly supports new validator fields.The SELECT clause and Optional bindings for
validatorGroupanddoesNotContainsare correctly structured to capture the new validator metadata introduced by the ValidatorGroup enum. The Optional clauses ensure backward compatibility with existing test data that may not include these fields.
97-100: Improved readability with explicit query structure.The reformatted query with explicit line breaks for each triple pattern enhances readability and maintainability without changing semantics.
core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala (2)
7-7: LGTM: Import added for new AbstractUtils utility.The import supports the new broken-bracket cleaning functionality introduced in this file.
51-51: LGTM: Configuration-driven feature flag.The
removeBrokenBracketsflag from NifParameters enables optional cleaning of malformed bracket patterns in abstracts, consistent with the AbstractUtils functionality.dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala (1)
23-26: LGTM: ValidatorGroup enum is well-structured.The new enumeration provides a clean grouping mechanism (RIGHT, LEFT, DEFAULT) for validator behavior, consistent with the existing enumeration patterns in this package object.
dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala (1)
1-3: LGTM: Construct case class is well-designed.The immutable data structure with optional
leftandrightfields provides a clean abstraction for representing RDF construct validation inputs. The defaultNonevalues for optional fields follow Scala idioms.core/src/test/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractorTest.scala (2)
15-15: LGTM: Test class renamed to match refactored extractor.The test class name now correctly reflects the renamed
PlainAbstractExtractorclass.
50-50: LGTM: Extractor instantiation updated correctly.The instantiation now uses
PlainAbstractExtractor, consistent with the class rename fromAbstractExtractor.dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties (1)
40-41: LGTM: Configuration updated for renamed extractor and new feature.The changes correctly reflect:
- The extractor rename from
.NifExtractorto.HtmlAbstractExtractor- The new
remove-broken-brackets-html-abstracts=trueflag to enable broken-bracket cleaning in abstractsThese updates align with the code changes in
WikipediaNifExtractor.scalaand the introduction ofAbstractUtils.removeBrokenBracketsInAbstracts.dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala (1)
20-20: Approve resource loading. extraction-configs/extraction.plain.abstracts.properties exists under dump/src/test/resources and matches the expected test configuration.dump/pom.xml (1)
18-19: LGTM: SHACL and CV testGroup properties correctly applied
AlltestGroupreferences in the dump module have been replaced; bothUtils.getGroup("shaclTestGroup")andUtils.getGroup("cvTestGroup")usages align with the new properties.dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericIRIValidator.scala (1)
5-5: LGTM! Validator refactor to Construct is well-executed.The transition from
StringtoConstructaligns with the broader validator refactor mentioned in the AI summary. The logic remains sound—nTriplePart.selfcorrectly provides the IRI string for validation via Apache Jena'sIRIResolver.checkIRI.Also applies to: 13-15
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericValidator.scala (1)
5-5: LGTM! Signature updated for consistency.The method signature now accepts
Constructto align with the validator refactor pattern. The no-op behavior (always returningtrue) remains appropriate for this placeholder validator.Also applies to: 13-13
README.md (1)
79-81: LGTM! Inline code formatting for namespaces improves readability.The use of backticks for namespace paths (
org.dbpedia.extraction.*) enhances clarity and follows Markdown best practices.dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala (1)
33-33: LGTM! Refactor to centralized Utils is cleaner.The migration from local helper methods to
Utils.getGroup()andUtils.loadTestGroupsKeys()improves code reusability and aligns with the broader test utility refactor mentioned in the AI summary. The parameters passed to these methods are appropriate for SHACL test group filtering.Also applies to: 117-117
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralValidator.scala (1)
10-10: LGTM! Construct-based validation is correctly implemented.The refactor from
StringtoConstructis consistent with the broader validator pattern. The triple construction at line 25 correctly usesnTriplePart.selfto extract the literal value for validation. The Apache Jena literal parser logic remains sound.Also applies to: 23-25
dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala (1)
47-54: No change needed; extract already blocks until all jobs complete. Thewhile (jobsRunning.size() > 0)loop (lines 98–101) guarantees the rename calls occur only after all futures finish.server/server.default.properties (1)
59-59: LGTM! Configuration updated to reference renamed extractor.The update from
.AbstractExtractorWikipediato.HtmlAbstractExtractoraligns with the class renaming in the codebase.dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericRdfLangStringValidator.scala (1)
12-14: LGTM! Method signature refactored to use Construct.The method signature change from
StringtoConstructaligns with the broader validator refactor. The logic correctly accesses the RDF type token vianTriplePart.self.dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties (1)
1-96: LGTM! Well-structured configuration for plain abstract extraction.The configuration file properly sets up extraction parameters, MediaWiki API connection settings, and the new
remove-broken-brackets-plain-abstractsfeature. The settings align with the PlainAbstractExtractor implementation.dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala (1)
3-24: LGTM! Validator trait updated to support Construct and ValidatorGroup.The changes introduce:
VALIDATOR_GROUPfield with a sensible default- Updated
runmethod signature to acceptConstructinstead ofStringtoStringoverride for better debuggingThese changes align with the broader validator refactor across the codebase.
core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala (4)
33-42: LGTM! Class renamed and logger updated correctly.The class rename from
AbstractExtractortoPlainAbstractExtractoris consistently applied, including the logger initialization.
53-53: LGTM! Configuration flag added for broken-brackets removal.The
removeBrokenBracketsflag is properly initialized from the configuration.
83-92: LGTM! Conditional broken-brackets removal implemented correctly.The conditional logic applies
AbstractUtils.removeBrokenBracketsInAbstractswhen the configuration flag is enabled, preserving the original text otherwise.
223-255: LGTM! Companion object renamed and patterns updated.The companion object is correctly renamed to
PlainAbstractExtractor, and thepatternsToRemovelist includes updated regex patterns.dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala (2)
26-30: LGTM! Test initialization added in beforeAll.The
beforeAllmethod properly loads the test model usingRDFDataMgrbefore tests execute, ensuring test data is available.
37-51: Verify the test-group filtering logic.The filtering removes statements where:
- The subject URI is in
selectValues(derived from group keys)- The object matches the
testGeneratorURIEnsure this filtering correctly isolates the productive group tests as intended by the test name change.
You can verify the filtering behavior by examining the loaded group keys and confirming they match the intended test coverage.
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/PatternValidator.scala (2)
6-11: LGTM! Constructor updated to support validator groups.The
validatorGroupparameter with a sensible default (ValidatorGroup.DEFAULT) allows flexible validator instantiation while maintaining backward compatibility.
13-26: LGTM! Pattern matching logic correctly handles validator groups.The
runmethod properly extracts values based on the validator group (RIGHT, LEFT, or DEFAULT) and applies the pattern. The TODO comment on Line 16 about potentially renaming "value" is noted but doesn't block the implementation.dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala (1)
1-36: Confirm no stale run(String) overrides or invocations remain
Project-wide search returned no matches for overriding or calling run with a String parameter; please manually verify no deprecated API usage remains.dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala (1)
100-101: Great context wiring for predicatesWrapping the predicate as
Construct(p, Some(s), Some(o))keeps both neighbors available, so RIGHT/LEFT validator groups can reach the object and subject without extra lookups. Nice.dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl (1)
284-292: Nice use of validatorGroup for abstract literalsAssigning
v:validatorGroup v:rightValidatorto the datatype validator ensures only the object literal is checked for the broken bracket pattern—exactly what the new group support is for. Looks good.
| > | ||
| > Therefore, you can use this tool to extract an example page and check if the error persists in the latest software version, and add the link you used for verification, e.g., http://dief.tools.dbpedia.org/server/extraction/en/extract?title=United+States | ||
| # Issue validity | ||
| > Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap bare URLs in angle brackets or Markdown links.
Multiple bare URLs appear in the template. For better Markdown compliance and clickability, wrap them in angle brackets <URL> or use Markdown link syntax [text](URL).
Based on static analysis hints.
Apply this diff:
-> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/
+> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into <http://dbpedia.org/sparql>. During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At <http://dief.tools.dbpedia.org/server/extraction/en/> we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: <http://dief.tools.dbpedia.org/server/extraction/en/>-* Web/SPARQL, e.g. http://dbpedia.org/sparql or http://dbpedia.org/resource/Berlin, please **provide query or link**
+* Web/SPARQL, e.g. <http://dbpedia.org/sparql> or <http://dbpedia.org/resource/Berlin>, please **provide query or link**Also applies to: 19-19
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
11-11: Bare URL used
(MD034, no-bare-urls)
11-11: Bare URL used
(MD034, no-bare-urls)
11-11: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In .github/ISSUE_TEMPLATE/data.md around lines 11 and 19, there are multiple
bare URLs that should be wrapped for proper Markdown rendering; replace each
bare URL (e.g. https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1,
http://dbpedia.org/sparql, http://dief.tools.dbpedia.org/server/extraction/en/)
with either angle-bracketed form <URL> or convert to Markdown links like
[Release Frequency &
Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1) and
similarly for the SPARQL and DIEF extractor URLs so they render and are
clickable.
| > Pick the appropriate: | ||
| # Details | ||
| > please post the details | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove blank line inside blockquote.
Line 25 contains a blank line within a blockquote, which breaks Markdown formatting. Remove the blank line or replace it with > to maintain the blockquote.
Based on static analysis hints.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
25-25: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🤖 Prompt for AI Agents
.github/ISSUE_TEMPLATE/data.md around line 25: there is a blank line inside a
blockquote which breaks Markdown rendering; remove the blank line (or replace it
with a '>' prefix) so the blockquote lines are contiguous and properly
formatted, ensuring the blockquote marker is present on the next line if you
want an empty quoted line.
| ``` | ||
| ``` | ||
| - Expected / corrected RDF outcome snippet (NTRIPLES): | ||
| > Expected / corrected RDF outcome snippet | ||
| ``` | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add language specifiers to fenced code blocks.
Fenced code blocks should specify a language for proper syntax highlighting. For RDF/SPARQL snippets, use turtle or sparql; for generic examples, use text.
Based on static analysis hints.
Apply this diff:
> Wrong triples RDF snippet
- ```
+ ```turtle
Expected / corrected RDF outcome snippet
Example DBpedia resource URL(s)
-+text
- +
Also applies to: 35-37
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>
27-27: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
---
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
.github/ISSUE_TEMPLATE/data.md around lines 27-33 (and similarly lines 35-37):
fenced code blocks lack language specifiers; update the opening triple-backtick
lines to include appropriate languages (use turtle for RDF/SPARQL snippets and text for generic examples) so the blocks become turtle or text
respectively, keeping the closing ``` unchanged.
</details>
<!-- This is an auto-generated comment by CodeRabbit -->
| //if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different | ||
| //entity and need to be saved as new resource. | ||
| //This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each | ||
| //about different entity, see dbr:Helene_Demuth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Fix grammar issues in comment.
The comment has several grammatical errors that should be corrected for clarity.
Apply this diff:
- //if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different
- //entity and need to be saved as new resource.
- //This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each
- //about different entity, see dbr:Helene_Demuth
+ // If we have more than one infobox and the name property of an infobox is different from the page title, then the infobox belongs to a different
+ // entity and needs to be saved as a new resource.
+ // This is to avoid assigning one entity's properties to another entity, because a single page can have multiple infoboxes, each
+ // about a different entity. See dbr:Helene_Demuth for an example.Note: This partially addresses a previous review comment.
🤖 Prompt for AI Agents
In core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala
around lines 80 to 83, the block comment contains multiple grammatical errors;
replace it with a clear, corrected comment such as: "If there is more than one
infobox and the infobox name differs from the page title, the infobox describes
a different entity and should be saved as a new resource. This prevents
assigning properties from one entity to another when a single page contains
multiple infoboxes about different entities (see dbr:Helene_Demuth)." Update the
file to use this corrected wording.
| private val pattern = patternString.r.pattern | ||
|
|
||
| override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL | ||
|
|
||
| override def run(nTriplePart: String): Boolean = { | ||
|
|
||
| val lexicalForm = nTriplePart.trim.split("\"").dropRight(1).drop(1).mkString("") | ||
|
|
||
| override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup | ||
|
|
||
| override def run(nTriplePart: Construct): Boolean = { | ||
| val lexicalForm = VALIDATOR_GROUP match { | ||
| case ValidatorGroup.RIGHT => nTriplePart.right match { | ||
| // TODO: 1) maybe we need to rename "value" | ||
| // 2) discuss what to do if we want to check the value that doesn't exist on | ||
| // the left or right side, at the moment we only return false in these cases | ||
| case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("") | ||
| case None => return false | ||
| } | ||
| case ValidatorGroup.LEFT => nTriplePart.left match { | ||
| case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("") | ||
| case None => return false | ||
| } | ||
| case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make lexical form extraction robust (escaped quotes) and simplify run()
Current split-based parsing can fail on escaped quotes within literals. Use a regex capture and Option flow; also avoid wildcard group match.
private val pattern = patternString.r.pattern
+ // Extracts literal lexical form between the first pair of quotes, handling escaped quotes.
+ private val lexicalRegex = "^\\s*\\\"((?:\\\\.|[^\\\"\\\\])*)\\\"".r
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL
override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup
- override def run(nTriplePart: Construct): Boolean = {
- val lexicalForm = VALIDATOR_GROUP match {
- case ValidatorGroup.RIGHT => nTriplePart.right match {
- // TODO: 1) maybe we need to rename "value"
- // 2) discuss what to do if we want to check the value that doesn't exist on
- // the left or right side, at the moment we only return false in these cases
- case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
- case None => return false
- }
- case ValidatorGroup.LEFT => nTriplePart.left match {
- case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
- case None => return false
- }
- case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("")
- }
- pattern.matcher(lexicalForm).matches()
- }
+ override def run(nTriplePart: Construct): Boolean = {
+ val selected: Option[String] = VALIDATOR_GROUP match {
+ case ValidatorGroup.RIGHT => nTriplePart.right
+ case ValidatorGroup.LEFT => nTriplePart.left
+ case ValidatorGroup.DEFAULT => Some(nTriplePart.self)
+ }
+ val lexicalFormOpt = selected.flatMap(s => lexicalRegex.findFirstMatchIn(s.trim).map(_.group(1)))
+ lexicalFormOpt.exists(s => pattern.matcher(s).matches())
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private val pattern = patternString.r.pattern | |
| override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL | |
| override def run(nTriplePart: String): Boolean = { | |
| val lexicalForm = nTriplePart.trim.split("\"").dropRight(1).drop(1).mkString("") | |
| override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup | |
| override def run(nTriplePart: Construct): Boolean = { | |
| val lexicalForm = VALIDATOR_GROUP match { | |
| case ValidatorGroup.RIGHT => nTriplePart.right match { | |
| // TODO: 1) maybe we need to rename "value" | |
| // 2) discuss what to do if we want to check the value that doesn't exist on | |
| // the left or right side, at the moment we only return false in these cases | |
| case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("") | |
| case None => return false | |
| } | |
| case ValidatorGroup.LEFT => nTriplePart.left match { | |
| case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("") | |
| case None => return false | |
| } | |
| case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("") | |
| } | |
| private val pattern = patternString.r.pattern | |
| // Extracts literal lexical form between the first pair of quotes, handling escaped quotes. | |
| private val lexicalRegex = "^\\s*\\\"((?:\\\\.|[^\\\"\\\\])*)\\\"".r | |
| override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL | |
| override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup | |
| override def run(nTriplePart: Construct): Boolean = { | |
| val selected: Option[String] = VALIDATOR_GROUP match { | |
| case ValidatorGroup.RIGHT => nTriplePart.right | |
| case ValidatorGroup.LEFT => nTriplePart.left | |
| case ValidatorGroup.DEFAULT => Some(nTriplePart.self) | |
| } | |
| val lexicalFormOpt = selected.flatMap(s => lexicalRegex.findFirstMatchIn(s.trim).map(_.group(1))) | |
| lexicalFormOpt.exists(s => pattern.matcher(s).matches()) | |
| } |
🤖 Prompt for AI Agents
In
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala
around lines 8 to 27, the current lexicalForm extraction uses split("\"") which
breaks on escaped quotes and uses a catch-all ValidatorGroup match; replace this
with an explicit match on ValidatorGroup.RIGHT and ValidatorGroup.LEFT (no
wildcard), pull the literal text from the appropriate Option safely, and extract
the quoted lexical form using a regex with a capture group that allows escaped
quotes (e.g. a pattern matching a leading quote, any sequence of non-unescaped
or escaped characters, then a closing quote) so you avoid splitting; propagate
Option properly (return false on None) and assign the captured group as
lexicalForm.
| val filePath = classLoader.getResource(path).getFile | ||
| val file = scala.io.Source.fromFile(filePath) | ||
|
|
||
| val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray | ||
| val columnsNames: Array[String] = table.head | ||
|
|
||
| if (!columnsNames.contains(group)) { | ||
| Array[String]() | ||
| } | ||
| else { | ||
| val indexOfGroup = columnsNames.indexOf(group) | ||
| val groupsKeys: Array[String] = table.tail.flatMap(row => | ||
| if (row(indexOfGroup) == option) Array[String](row(0)) | ||
| else Array[String]()) | ||
| groupsKeys | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix resource handling in loadTestGroupsKeys (leak) and hardening
Use getResourceAsStream, ensure streams are closed, and guard empty/short rows.
- val filePath = classLoader.getResource(path).getFile
- val file = scala.io.Source.fromFile(filePath)
-
- val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
- val columnsNames: Array[String] = table.head
-
- if (!columnsNames.contains(group)) {
- Array[String]()
- }
- else {
- val indexOfGroup = columnsNames.indexOf(group)
- val groupsKeys: Array[String] = table.tail.flatMap(row =>
- if (row(indexOfGroup) == option) Array[String](row(0))
- else Array[String]())
- groupsKeys
- }
+ val resource = Option(classLoader.getResourceAsStream(path))
+ .getOrElse(throw new IllegalArgumentException(s"Resource not found: $path"))
+ val source = scala.io.Source.fromInputStream(resource)(scala.io.Codec.UTF8)
+ try {
+ val table: Array[Array[String]] = source.getLines().map(_.split(",")).toArray
+ if (table.isEmpty) Array.empty[String]
+ else {
+ val columnsNames: Array[String] = table.head
+ if (!columnsNames.contains(group)) Array.empty[String]
+ else {
+ val indexOfGroup = columnsNames.indexOf(group)
+ table.tail.collect {
+ case row if row.length > indexOfGroup && row(indexOfGroup) == option => row(0)
+ }.toArray
+ }
+ }
+ } finally {
+ source.close()
+ resource.close()
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val filePath = classLoader.getResource(path).getFile | |
| val file = scala.io.Source.fromFile(filePath) | |
| val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray | |
| val columnsNames: Array[String] = table.head | |
| if (!columnsNames.contains(group)) { | |
| Array[String]() | |
| } | |
| else { | |
| val indexOfGroup = columnsNames.indexOf(group) | |
| val groupsKeys: Array[String] = table.tail.flatMap(row => | |
| if (row(indexOfGroup) == option) Array[String](row(0)) | |
| else Array[String]()) | |
| groupsKeys | |
| } | |
| val resource = Option(classLoader.getResourceAsStream(path)) | |
| .getOrElse(throw new IllegalArgumentException(s"Resource not found: $path")) | |
| val source = scala.io.Source.fromInputStream(resource)(scala.io.Codec.UTF8) | |
| try { | |
| val table: Array[Array[String]] = source.getLines().map(_.split(",")).toArray | |
| if (table.isEmpty) Array.empty[String] | |
| else { | |
| val columnsNames: Array[String] = table.head | |
| if (!columnsNames.contains(group)) Array.empty[String] | |
| else { | |
| val indexOfGroup = columnsNames.indexOf(group) | |
| table.tail.collect { | |
| case row if row.length > indexOfGroup && row(indexOfGroup) == option => row(0) | |
| }.toArray | |
| } | |
| } | |
| } finally { | |
| source.close() | |
| resource.close() | |
| } |
🤖 Prompt for AI Agents
In dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala around lines 15
to 30, replace the use of classLoader.getResource(...).getFile and
scala.io.Source.fromFile (which leaks) with
classLoader.getResourceAsStream(path) and wrap the InputStream in a safe closing
construct (try/finally or scala.util.Using) to ensure the stream is closed;
guard against getResourceAsStream returning null and return an empty Array when
resource is missing; parse lines using Source.fromInputStream, skip empty lines,
defensively check row length before indexing (row.length > indexOfGroup &&
row.nonEmpty) and handle empty table or missing header by returning an empty
Array[String] rather than throwing.
| val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties")) | ||
| val properties = new Properties() | ||
| resourceInputStream match { | ||
| case Some(inputStream) => properties.load(inputStream) | ||
| case None => return TestConfig.defaultTestGroup | ||
| } | ||
| val groupOption = Option(properties.getProperty(testName)) | ||
| groupOption match { | ||
| case Some(group) => group | ||
| case None => TestConfig.defaultTestGroup | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close properties InputStream to avoid leak
Ensure the stream is closed; simplify defaulting.
- val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
- val properties = new Properties()
- resourceInputStream match {
- case Some(inputStream) => properties.load(inputStream)
- case None => return TestConfig.defaultTestGroup
- }
- val groupOption = Option(properties.getProperty(testName))
- groupOption match {
- case Some(group) => group
- case None => TestConfig.defaultTestGroup
- }
+ val isOpt = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
+ isOpt match {
+ case Some(is) =>
+ val properties = new Properties()
+ try properties.load(is) finally is.close()
+ Option(properties.getProperty(testName)).getOrElse(TestConfig.defaultTestGroup)
+ case None =>
+ TestConfig.defaultTestGroup
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties")) | |
| val properties = new Properties() | |
| resourceInputStream match { | |
| case Some(inputStream) => properties.load(inputStream) | |
| case None => return TestConfig.defaultTestGroup | |
| } | |
| val groupOption = Option(properties.getProperty(testName)) | |
| groupOption match { | |
| case Some(group) => group | |
| case None => TestConfig.defaultTestGroup | |
| } | |
| } | |
| val isOpt = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties")) | |
| isOpt match { | |
| case Some(is) => | |
| val properties = new Properties() | |
| try properties.load(is) finally is.close() | |
| Option(properties.getProperty(testName)).getOrElse(TestConfig.defaultTestGroup) | |
| case None => | |
| TestConfig.defaultTestGroup | |
| } |
🤖 Prompt for AI Agents
In dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala around lines 34
to 45, the InputStream from getResourceAsStream is not closed and the properties
defaulting is verbose; fix by: check for null resource stream, load properties
inside a try-finally (or use scala.util.Using) that closes the stream after
loading, then simplify the return by using
Option(properties.getProperty(testName)).getOrElse(TestConfig.defaultTestGroup)
so the stream is always closed and the defaulting is concise.
| val minidumpDir = new File("./target/minidumptest/base") | ||
| minidumpDir.listFiles().foreach(f => { | ||
| val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2") | ||
| if (longAbstractsFile.exists()) { | ||
| longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2")) | ||
| } | ||
| val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2") | ||
| if (shortAbstractsFile.exists()) { | ||
| shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2")) | ||
| } | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard listFiles() and check rename results
Prevent NPE when base dir is absent and surface rename failures.
- val minidumpDir = new File("./target/minidumptest/base")
- minidumpDir.listFiles().foreach(f => {
+ val minidumpDir = new File("./target/minidumptest/base")
+ Option(minidumpDir.listFiles()).getOrElse(Array.empty[File]).foreach { f =>
val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
if (longAbstractsFile.exists()) {
- longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
+ val ok = longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
+ if (!ok) System.err.println(s"WARN: Could not rename: ${longAbstractsFile.getPath}")
}
val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
if (shortAbstractsFile.exists()) {
- shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
+ val ok = shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
+ if (!ok) System.err.println(s"WARN: Could not rename: ${shortAbstractsFile.getPath}")
}
- })
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val minidumpDir = new File("./target/minidumptest/base") | |
| minidumpDir.listFiles().foreach(f => { | |
| val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2") | |
| if (longAbstractsFile.exists()) { | |
| longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2")) | |
| } | |
| val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2") | |
| if (shortAbstractsFile.exists()) { | |
| shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2")) | |
| } | |
| }) | |
| } | |
| val minidumpDir = new File("./target/minidumptest/base") | |
| Option(minidumpDir.listFiles()).getOrElse(Array.empty[File]).foreach { f => | |
| val longAbstractsFile = new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2") | |
| if (longAbstractsFile.exists()) { | |
| val ok = longAbstractsFile.renameTo( | |
| new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2")) | |
| if (!ok) System.err.println(s"WARN: Could not rename: ${longAbstractsFile.getPath}") | |
| } | |
| val shortAbstractsFile = new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2") | |
| if (shortAbstractsFile.exists()) { | |
| val ok = shortAbstractsFile.renameTo( | |
| new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2")) | |
| if (!ok) System.err.println(s"WARN: Could not rename: ${shortAbstractsFile.getPath}") | |
| } | |
| } |
🤖 Prompt for AI Agents
In dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala around lines 48
to 59, guard against a null result from minidumpDir.listFiles() to prevent a
NullPointerException when the base directory is missing, and check the boolean
return value of File.renameTo(...) to detect and log or fail on rename failures;
specifically, only iterate when listFiles() != null (or use Option/empty list),
and after each renameTo call verify its result and handle failure (e.g., throw
an exception or call fail/log with the source and target paths).
| - Get help with the [Maven build](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html) or another form of [installation](https://maven.apache.org/install.html). | ||
| - [Download](https://dumps.wikimedia.org/) some data to work with. | ||
| - How to run [from Scala/Java](https://docs.scala-lang.org/tutorials/scala-with-maven.html) or [from a JAR](https://docs.oracle.com/javase/tutorial/deployment/jar/run.html). | ||
| - Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap bare URL in angle brackets or Markdown link.
Line 125 contains a bare URL (https://forum.dbpedia.org) that should be wrapped for better Markdown compliance.
Based on static analysis hints.
Apply this diff:
-- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org.
+- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on <https://forum.dbpedia.org>.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org. | |
| - Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on <https://forum.dbpedia.org>. |
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
125-125: Bare URL used
(MD034, no-bare-urls)
🤖 Prompt for AI Agents
In README.md around line 125, there's a bare URL (https://forum.dbpedia.org)
that needs to be wrapped for proper Markdown formatting; replace the bare URL
with either a Markdown link text like [DBpedia forum](https://forum.dbpedia.org)
or wrap it in angle brackets <https://forum.dbpedia.org> so the link is
rendered/clickable and compliant with Markdown linting.


this is a temporary pull request in order to check how well older commits from the dev branch can be merged into current master
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests