Skip to content

Conversation

@JJ-Author
Copy link
Contributor

@JJ-Author JJ-Author commented Sep 18, 2025

this is a temporary pull request in order to check how well older commits from the dev branch can be merged into current master

Summary by CodeRabbit

  • New Features

    • Improved abstract extraction with new Plain and HTML extractors and an option to auto-clean broken brackets in abstracts.
    • Validator groups (LEFT/RIGHT/DEFAULT) for finer-grained construct validation.
  • Improvements

    • Default configurations updated to use the HTML abstracts extractor; new plain abstracts config available.
    • Enhanced CI/validation with additional triggers, generators, and coverage for abstracts and IRIs.
  • Documentation

    • Updated README links/formatting.
    • Overhauled issue template for clearer reporting.
  • Tests

    • Added SHACL rules and instances; expanded CI test coverage.
    • New utilities for test grouping and dataset file handling.

jlareck and others added 21 commits May 24, 2021 00:03
Removing broken information in brackets
* implement selection of construct validation tests

* rename test report file of construct validation tests

* fix path to test reports directory

* add removing brackets that start with "(,"

* make removing broken brackets in WikipediaNifExtractor configurable

* rename abstract extractors

* fix getting short abstracts in WikipediaNifExtractor

* generate two dumps for plain and html abstract extractors

* fix minidump abstract extraction
* add Construct model

* add construct validation test for (; in abstracts

* implement handling of right and left validators
* Fix merging of multiple infoboxes under one resource, see issue #709
* Add new page to the minidump for testing
* Create SHACL test, add it to the shacl-test-groups.csv and disabled isbn test
closes #771.

---------

Co-authored-by: Ted Thibodeau Jr <[email protected]>
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
5.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link
Contributor

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Editorial. Minor grammar, punctuation, etc.

>
> Therefore, you can use this tool to extract an example page and check if the error persists in the latest software version, and add the link you used for verification, e.g., http://dief.tools.dbpedia.org/server/extraction/en/extract?title=United+States
# Issue validity
> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/
> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into <http://dbpedia.org/sparql>. During these three months, Wikipedia changes and the DBpedia Information Extraction Framework receive patches. At <http://dief.tools.dbpedia.org/server/extraction/en/>, we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g., `Berlin` or `Joe_Biden` here: <http://dief.tools.dbpedia.org/server/extraction/en/>.

Comment on lines +18 to +21
> Where did you find the data issue? Non-exhaustive options are:
* Web/SPARQL, e.g. http://dbpedia.org/sparql or http://dbpedia.org/resource/Berlin, please **provide query or link**
* Dumps: dumps are managed by the Databus. Please **provide artifact & version or download link**
* DIEF: you ran the software and the error occured then, please **include all necessary information such as the extractor or log**. If you had problems running the software use [another issue template](https://github.com/dbpedia/extraction-framework/issues/new/choose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Where did you find the data issue? Non-exhaustive options are:
* Web/SPARQL, e.g. http://dbpedia.org/sparql or http://dbpedia.org/resource/Berlin, please **provide query or link**
* Dumps: dumps are managed by the Databus. Please **provide artifact & version or download link**
* DIEF: you ran the software and the error occured then, please **include all necessary information such as the extractor or log**. If you had problems running the software use [another issue template](https://github.com/dbpedia/extraction-framework/issues/new/choose)
> Where did you find the data issue? Non-exhaustive options include the following:
* Web/SPARQL, e.g., <http://dbpedia.org/sparql> or <http://dbpedia.org/resource/Berlin>, please **provide query or link.**
* Dumps: dumps are managed by the Databus. Please **provide artifact & version or download link.**
* DIEF: you ran the software and the error occured then. Please **include all necessary information such as the extractor or log.** If you had problems running the software use [another issue template](https://github.com/dbpedia/extraction-framework/issues/new/choose).

Comment on lines +34 to +35
>Example DBpedia resource URL(s)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
>Example DBpedia resource URL(s)
```
> Example DBpedia resource URL(s)
```

### Additional context
> Add any other context about the problem here.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
```

**Documentation**: http://dev.dbpedia.org/Extraction <br/>
**Get in touch with DBpedia**: https://wiki.dbpedia.org/join/get-in-touch <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace](https://dbpedia-slack.herokuapp.com/) - the main point for [developement updates](https://github.com/dbpedia/extraction-framework/blob/master/.github/workflows/maven.yml) and discussions <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) Slack channel within the the [DBpedia Slack workspace](https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) the main point for development updates and discussions <br/>

6. When you are done with a bugfix or feature, `rebase` your branch onto `extraction-framework/dev` (`git pull --rebase git://github.com/dbpedia/extraction-framework.git`). Resolve possible conflicts and commit.
7. Push your branch to GitHub (`git push origin fixRestApiParams`).
8. Send a pull request from your branch into `extraction-framework/dev` via GitHub.
* In the description, reference the associated commit (for example, _"Fixes #123 by ..."_ for issue number 123).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In the description, reference the associated commit (for example, _"Fixes #123 by ..."_ for issue number 123).
* In the description, reference the associated issue (for example, _"Fixes #123 by ..."_ for issue number 123).

* In the description, reference the associated commit (for example, _"Fixes #123 by ..."_ for issue number 123).
* Your changes will be reviewed and discussed on GitHub.
* In addition, [Travis-CI](http://about.travis-ci.org/) will test if the merged version passes the build.
* In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test if the merged version passes the build.
* In addition, [Travis-CI](https://www.travis-ci.com/about-us/) will test whether the merged version passes the build.


Please keep in mind:
- Try *not* to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made.
- **Never** rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **Never** rebase the master onto a development branch (i.e. _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch!
- **Never** rebase the master onto a development branch (i.e., _never_ call `rebase` from `extraction-framework/master`). Only rebase your branch onto the dev branch, *if and only if* nobody already pulled from the development branch!

Comment on lines +80 to +82
//if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different
//entity and need to be saved as new resource.
//This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different
//entity and need to be saved as new resource.
//This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each
//if we have more than one info box and name property of infobox is different than title of page, then infobox belongs to a different
//entity and needs to be saved as a new resource.
//This is to avoid assigning one entity's properties to another entity, because a single page can have multiple info boxes and each is

#
#log-dir= see: ../core/src/main/resources/universal.properties

# WikiPages failed to extract in the first try can be retried with this option (especially interesting when extraction from the mediawiki api)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# WikiPages failed to extract in the first try can be retried with this option (especially interesting when extraction from the mediawiki api)
# WikiPages failed to extract in the first try can be retried with this option (especially interesting when extracting from the mediawiki api)

@JJ-Author
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

Walkthrough

Renames and reconfigures abstract extractors (AbstractExtractor → PlainAbstractExtractor; AbstractExtractorWikipedia → HtmlAbstractExtractor), adds configurable broken-brackets cleaning to abstracts (core and NIF), refactors construct validation to use Construct and ValidatorGroup across validators and test suites, updates configs, tests, and docs, and introduces new utilities and SHACL tests.

Changes

Cohort / File(s) Summary
Issue template & docs
.github/ISSUE_TEMPLATE/data.md, README.md, core/doc/HowTo-release-DBpedia.txt
Reword/reorder issue template; update links/formatting; doc step references PlainAbstractExtractor.
Abstract extraction (core code)
core/src/main/scala/org/dbpedia/extraction/config/Config.scala, core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala, core/src/main/scala/org/dbpedia/extraction/mappings/HtmlAbstractExtractor.scala, core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala, core/src/main/scala/org/dbpedia/extraction/mappings/NifExtractor.scala, core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala, core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala
Adds removeBrokenBracketsProperty config; renames AbstractExtractor→PlainAbstractExtractor and AbstractExtractorWikipedia→HtmlAbstractExtractor; introduces AbstractUtils.removeBrokenBracketsInAbstracts; applies optional cleaning in Plain/NIF flows; logger/category updates; comment updates.
Abstract extraction (tests)
core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala, core/src/test/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractorTest.scala, core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala
Update references to PlainAbstractExtractor; add tests for AbstractUtils; adjust post-processing calls.
Configs: extraction/server/live
dump/extraction.abstracts.properties, dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties, dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties, live/live.default.xml, server/server.default.properties
Switch extractor classes to Plain/Html variants; add remove-broken-brackets flags; add new plain abstracts config; update live/server default extractors.
Validation model & enums
dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala, dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala
Add Construct case class; add ValidatorGroup enum (RIGHT, LEFT, DEFAULT).
Validators API refactor
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala, .../validators/NotContainsValidator.scala, .../validators/PatternValidator.scala, .../validators/TypedLiteralValidator.scala, .../validators/VocabValidator.scala, .../validators/generic/GenericIRIValidator.scala, .../validators/generic/GenericLiteralLangTagValidator.scala, .../validators/generic/GenericLiteralValidator.scala, .../validators/generic/GenericRdfLangStringValidator.scala, .../validators/generic/GenericValidator.scala
Change run signatures to accept Construct; add optional validatorGroup param and VALIDATOR_GROUP; branch run logic on group (left/right/self); import updates.
Validation generation & suites
dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala, dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala, dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala, dump/src/main/scala/org/dbpedia/validation/construct/tests/TestSuiteFactory.scala
Add validatorGroup parsing from SPARQL; pass group to validators; adjust queries to expose group fields; operate on Construct in suite; trivial EOF newline in factory.
Test harness & utilities
dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala, dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala, dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala, dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala, dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala, dump/pom.xml
Add Utils (group loading, group value getter, dataset file renamer); introduce plainAbstractConfig and classLoader exposure; two-phase abstract extraction in tests; move group handling to Utils in SHACL tests; split testGroup into shaclTestGroup/cvTestGroup; pre-test RDF filtering; add beforeAll setup.
Test resources: SHACL/CI
dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl, dump/src/test/resources/ci-tests/xsd_ci-tests.ttl, dump/src/test/resources/shacl-tests/instances/Marian_Breland_Bailey.ttl, dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl, dump/src/test/resources/shaclTestsCoverageTable.md, dump/src/test/bash/minidump-overview.md, dump/src/test/bash/uris.lst
Add/expand triggers, generators, validators (including abstract checks), vocab refs; formatting cleanups; add SHACL shapes/resources; update coverage table; add article/URI entries.
Mappings logic refactor
core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala
Refactors extract flow into explicit conditions and helpers (instance URI generation, type creation); adds infobox-aware handling; annotations maintained.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Config
  participant E as Extractor (Html/Plain/NIF)
  participant M as MediaWikiConnector
  participant U as AbstractUtils
  participant W as DatasetWriter

  C->>E: init with removeBrokenBracketsProperty
  E->>M: fetch page HTML/text
  M-->>E: raw abstract text
  alt removeBrokenBracketsProperty = true
    E->>U: removeBrokenBracketsInAbstracts(text)
    U-->>E: cleanedText
    E->>W: write quads (cleanedText)
  else
    E->>W: write quads (raw text)
  end
Loading
sequenceDiagram
  autonumber
  participant G as NTripleTestGenerator
  participant S as NTripleTestSuite
  participant V as Validator (Pattern/Typed/Vocab/NotContains)
  participant C as Construct

  G->>G: parse SPARQL results (validatorGroup)
  G-->>S: validators with VALIDATOR_GROUP
  S->>S: parse N-Triples line
  S-->>S: Array(Construct(s), Construct(p, s,o), Construct(o))
  loop for each Construct
    S->>V: run(Construct)
    alt VALIDATOR_GROUP = RIGHT
      V->>V: use Construct.right
    else VALIDATOR_GROUP = LEFT
      V->>V: use Construct.left
    else DEFAULT
      V->>V: use Construct.self
    end
    V-->>S: Boolean
  end
  S-->>S: aggregate scores
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

A rabbit taps keys with abstract delight,
Snips broken brackets, sets validators right.
Left ear for LEFT, right ear for RIGHT,
Plain or HTML, we clean up the byte.
With Constructs in paws and SHACL in sight,
We hop through configs—tests green by night. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The pull request title “temporary PR for Dev” is too vague and does not describe the actual changes or intent of the merge, failing to summarize the main content of the changeset. It uses non-descriptive language and provides no guidance about what the PR accomplishes. Please rename the PR to a concise, descriptive title that highlights the primary change, for example “Merge dev branch updates into master: add removeBrokenBrackets extractor flag and extend Construct validators” or similar wording that clearly conveys the core modifications.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala (1)

14-15: Trim input; fix iri typo

Trim before matching to avoid whitespace issues. Also fix the iri string typo.

-  override val iri: ValidatorIRI = "#GENERIC_LTIERAL_LANG_TAG_VALIDATOR"
+  override val iri: ValidatorIRI = "#GENERIC_LITERAL_LANG_TAG_VALIDATOR"
@@
-    pattern.matcher(nTriplePart.self).matches()
+    pattern.matcher(nTriplePart.self.trim).matches()

Also applies to: 18-21

dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala (1)

284-289: Propagate validatorGroup to doesNotContains validators

v:validatorGroup on IRI validators is ignored here because we still call NotContainsValidator with the legacy (id, iri, charSeq) signature. That forces every validator back to DEFAULT, even when the TTL assigns leftValidator/rightValidator, so side-specific constraints can’t be expressed. Please thread the group through like the other constructors.

-          validatorQuerySolution.getLiteral("doesNotContains").getLexicalForm.split(delim).foreach(charSeq => {
-
-            validatorCollection.append(NotContainsValidator(currentValidatorID, validatorIRI, charSeq))
+          validatorQuerySolution.getLiteral("doesNotContains").getLexicalForm.split(delim).foreach(charSeq => {
+            val validatorGroup = getValidatorGroup(validatorQuerySolution)
+            validatorCollection.append(NotContainsValidator(currentValidatorID, validatorIRI, charSeq, validatorGroup))
             groupedValidators.append(currentValidatorID)
             currentValidatorID += 1
           })
♻️ Duplicate comments (2)
README.md (1)

7-7: Fix typo: "developement" → "development".

Line 7 contains a typo in the Slack channel description.

Apply this diff:

-**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for developement updates and discussions <br/>
+**Slack**: join the [**#dev-team**](https://dbpedia.slack.com/archives/C0L9MJFU7) slack channel within the the [DBpedia Slack workspace]( https://join.slack.com/t/dbpedia/shared_invite/zt-nffbn1ra-dRoi8oeWBlolJb_lKifEqA) - the main point for development updates and discussions <br/>
dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties (1)

13-13: Address the grammatical error in the comment.

As noted in a previous review, the comment has a grammatical issue. Consider applying the suggested fix.

🧹 Nitpick comments (9)
dump/src/test/resources/shaclTestsCoverageTable.md (1)

29-30: Consider fixing markdown table formatting.

The functional changes (adding #en_abstract_validation tags) are correct and align with the new SHACL validation rules. However, the markdown table has formatting issues: many rows have only 2 cells but the table header defines 4 columns (wikipage-uri, shacl-test, issue, comment).

To improve readability, consider either:

  1. Adding empty cells for missing columns: | | | at the end
  2. Or restructuring the table to match the actual data

Example for line 29:

-[http://de.dbpedia.org/resource/Arthur_Schopenhauer](...) | [http://dbpedia.org/ontology/abstract](...) #en_abstract_validation | 
+[http://de.dbpedia.org/resource/Arthur_Schopenhauer](...) | [http://dbpedia.org/ontology/abstract](...) #en_abstract_validation | | 

Based on static analysis hints.

Also applies to: 32-32, 43-43, 54-54, 64-64, 72-72, 83-83, 94-95, 104-104, 114-114, 125-125, 127-127, 131-134, 139-139, 148-148, 151-152, 159-159, 169-170, 172-172, 216-217, 229-229

core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala (1)

98-104: Fix code style inconsistencies.

There are several code style issues in the infobox name extraction logic.

Apply this diff:

-          var name = subjectUri;
+          var name = subjectUri
           if(allNames.size > 0)
             name = allNames(0).propertyNodeValueToPlainText
 
           //getting subject  of wikipedia page
-          var splittedURI = subjectUri.split("/")
-          var pageTitle = splittedURI(splittedURI.size - 1)
+          val splitUri = subjectUri.split("/")
+          val pageTitle = splitUri(splitUri.size - 1)

Changes:

  1. Remove unnecessary semicolon (line 98)
  2. Fix typo: "splittedURI" → "splitUri"
  3. Use val instead of var for immutable values (lines 103-104)
core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala (1)

75-81: LGTM: Conditional cleaning logic is correct.

The implementation correctly applies AbstractUtils.removeBrokenBracketsInAbstracts to both long and short abstracts when the feature is enabled, falling back to unmodified text otherwise.

Consider adding a comment explaining what "broken brackets" are (e.g., unclosed or mismatched bracket patterns) to improve code documentation:

+  // Clean malformed or unclosed bracket patterns (e.g., "text [[link") from abstracts when enabled
   val (cleanLongAbstract, cleanShortAbstract) = if (removeBrokenBrackets) {
core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala (1)

1-30: Consider adding edge case tests.

The current test coverage is good for common scenarios. However, consider adding tests for:

  • Nested brackets: "Text (outer (inner)) more text"
  • Unmatched brackets: "Text (unclosed" or "Text closed)"
  • Multiple consecutive broken brackets: "Text () () () more"
  • Mixed valid and broken brackets: "Text (valid content) (; broken)"

These edge cases would help ensure robust handling of malformed Wikipedia abstracts.

README.md (1)

100-109: Fix list indentation for consistency.

Lines 100, 101, and 108 have incorrect indentation for list items (2 spaces instead of 0). While this doesn't break functionality, consistent indentation improves readability.

Based on static analysis hints.

core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala (2)

25-25: Add spacing around the += operator.

For consistency with Scala style conventions, add spaces around the += operator.

Apply this diff:

-        if (bracketsWithSemicolon != 0) {
-          closeBrackets+=1
-        }
+        if (bracketsWithSemicolon != 0) {
+          closeBrackets += 1
+        }

9-44: Consider adding inline comments for complex bracket-tracking logic.

The nested bracket tracking using three counters (bracketsWithSemicolon, closeBrackets, skipBrackets) is non-trivial. Adding brief inline comments explaining the state transitions would improve maintainability.

dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala (1)

16-33: Normalize and simplify group routing in run()

Trim the selected value and avoid nested matches; use explicit DEFAULT for future-proofing.

-  override def run(nTriplePart: Construct): Boolean = {
-    VALIDATOR_GROUP match {
-      case ValidatorGroup.RIGHT => nTriplePart.right match {
-        // TODO: maybe we need to rename "value"
-        case Some(value) => vocab.contains(value)
-        case None => false
-      }
-      case ValidatorGroup.LEFT => nTriplePart.left match {
-        case Some(value) => vocab.contains(value)
-        case None => false
-      }
-      case _ => vocab.contains(nTriplePart.self)
-    }
-  }
+  override def run(nTriplePart: Construct): Boolean = {
+    val selected: Option[String] = VALIDATOR_GROUP match {
+      case ValidatorGroup.RIGHT   => nTriplePart.right
+      case ValidatorGroup.LEFT    => nTriplePart.left
+      case ValidatorGroup.DEFAULT => Some(nTriplePart.self)
+    }
+    selected.exists(v => vocab.contains(v.trim))
+  }
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/NotContainsValidator.scala (1)

10-22: Simplify group handling and avoid nested matches

Use Option flow with explicit DEFAULT; keeps semantics and reduces branching.

-  override def run(nTriplePart: Construct): Boolean = {
-    VALIDATOR_GROUP match {
-      case ValidatorGroup.RIGHT => nTriplePart.right match {
-        // TODO: maybe we need to rename "value"
-        case Some(value) => !value.contains(sequence)
-        case None => false
-      }
-      case ValidatorGroup.LEFT => nTriplePart.left match {
-        case Some(value) => !value.contains(sequence)
-        case None => false
-      }
-      case _ => !nTriplePart.self.contains(sequence)
-    }
-  }
+  override def run(nTriplePart: Construct): Boolean = {
+    val selected: Option[String] = VALIDATOR_GROUP match {
+      case ValidatorGroup.RIGHT   => nTriplePart.right
+      case ValidatorGroup.LEFT    => nTriplePart.left
+      case ValidatorGroup.DEFAULT => Some(nTriplePart.self)
+    }
+    selected.exists(v => !v.contains(sequence))
+  }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb0e463 and b88ab6b.

⛔ Files ignored due to path filters (89)
  • dump/src/test/resources/cv-test-groups.csv is excluded by !**/*.csv
  • dump/src/test/resources/minidumps/af/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/als/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/an/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ar/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/arz/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ast/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/az/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/azb/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ba/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/bar/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/be/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/bg/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/bn/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/br/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/bs/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ca/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ceb/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ckb/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/commons/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/cs/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/cy/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/da/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/de/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/el/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/en/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/eo/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/es/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/et/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/eu/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/fa/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/fi/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/fr/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/fy/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ga/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/gd/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/gl/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/he/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/hr/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/hu/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/hy/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/id/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/is/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/it/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ja/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ka/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/kn/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ko/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ku/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ky/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/la/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/lb/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/lt/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/lv/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/mk/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ml/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/mn/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ms/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/nds/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/nl/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/nn/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/no/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/os/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/pl/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/pt/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ro/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ru/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/sco/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/sh/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/si/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/simple/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/sl/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/sq/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/sr/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/sv/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ta/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/te/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/tg/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/th/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/tl/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/tr/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/tt/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/uk/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/ur/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/vec/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/vi/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/war/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/minidumps/zh/wiki.xml.bz2 is excluded by !**/*.bz2
  • dump/src/test/resources/shacl-test-groups.csv is excluded by !**/*.csv
📒 Files selected for processing (48)
  • .github/ISSUE_TEMPLATE/data.md (1 hunks)
  • README.md (4 hunks)
  • core/doc/HowTo-release-DBpedia.txt (1 hunks)
  • core/src/main/scala/org/dbpedia/extraction/config/Config.scala (4 hunks)
  • core/src/main/scala/org/dbpedia/extraction/mappings/HtmlAbstractExtractor.scala (1 hunks)
  • core/src/main/scala/org/dbpedia/extraction/mappings/MissingAbstractsExtractor.scala (1 hunks)
  • core/src/main/scala/org/dbpedia/extraction/mappings/NifExtractor.scala (2 hunks)
  • core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala (8 hunks)
  • core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala (1 hunks)
  • core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala (4 hunks)
  • core/src/main/scala/org/dbpedia/extraction/util/abstracts/AbstractUtils.scala (1 hunks)
  • core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala (1 hunks)
  • core/src/test/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractorTest.scala (2 hunks)
  • core/src/test/scala/org/dbpedia/extraction/util/AbstractUtilsTest.scala (1 hunks)
  • dump/extraction.abstracts.properties (1 hunks)
  • dump/pom.xml (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/NotContainsValidator.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/PatternValidator.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala (2 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericIRIValidator.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralLangTagValidator.scala (2 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralValidator.scala (2 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericRdfLangStringValidator.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericValidator.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/tests/TestSuiteFactory.scala (1 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/NTripleTestGenerator.scala (6 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala (2 hunks)
  • dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala (4 hunks)
  • dump/src/test/bash/minidump-overview.md (1 hunks)
  • dump/src/test/bash/uris.lst (1 hunks)
  • dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl (11 hunks)
  • dump/src/test/resources/ci-tests/xsd_ci-tests.ttl (1 hunks)
  • dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties (1 hunks)
  • dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties (1 hunks)
  • dump/src/test/resources/shacl-tests/instances/Marian_Breland_Bailey.ttl (1 hunks)
  • dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl (1 hunks)
  • dump/src/test/resources/shaclTestsCoverageTable.md (12 hunks)
  • dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala (2 hunks)
  • dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala (2 hunks)
  • dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala (2 hunks)
  • dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala (1 hunks)
  • dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala (1 hunks)
  • live/live.default.xml (1 hunks)
  • server/server.default.properties (1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
README.md

107-107: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


108-108: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


109-109: Unordered list indentation
Expected: 0; Actual: 2

(MD007, ul-indent)


112-112: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


112-112: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


113-113: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


113-113: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


125-125: Bare URL used

(MD034, no-bare-urls)

dump/src/test/resources/shaclTestsCoverageTable.md

29-29: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


29-29: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


30-30: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


30-30: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


31-31: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


31-31: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


32-32: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


32-32: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


43-43: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


43-43: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


54-54: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


54-54: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


64-64: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


64-64: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


65-65: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


65-65: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


66-66: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


66-66: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


67-67: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


67-67: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


68-68: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


68-68: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


69-69: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


69-69: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


70-70: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


70-70: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


71-71: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


72-72: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


72-72: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


83-83: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


83-83: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


94-94: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


94-94: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


95-95: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


95-95: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


104-104: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


104-104: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


114-114: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


114-114: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


125-125: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


125-125: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


126-126: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


126-126: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


127-127: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


127-127: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


128-128: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


128-128: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


129-129: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


129-129: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


130-130: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


131-131: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


131-131: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


132-132: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


132-132: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


133-133: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


133-133: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


134-134: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


134-134: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


135-135: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


135-135: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


136-136: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


136-136: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


137-137: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


137-137: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


138-138: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


138-138: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


139-139: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


139-139: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


148-148: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


148-148: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


149-149: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


149-149: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


150-150: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


150-150: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


151-151: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


151-151: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


152-152: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


152-152: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


153-153: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


153-153: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


154-154: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


154-154: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


155-155: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


155-155: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


156-156: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


157-157: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


157-157: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


158-158: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


158-158: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


159-159: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


159-159: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


160-160: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


160-160: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


161-161: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


161-161: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


162-162: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


162-162: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


163-163: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


163-163: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


164-164: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


164-164: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


165-165: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


165-165: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


166-166: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


166-166: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


167-167: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


167-167: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


168-168: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


168-168: Table column count
Expected: 4; Actual: 1; Too few cells, row will be missing data

(MD056, table-column-count)


169-169: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


169-169: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


170-170: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


170-170: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


171-171: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


172-172: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


172-172: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


216-216: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


216-216: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


217-217: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


217-217: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)


229-229: Table pipe style
Expected: no_leading_or_trailing; Actual: trailing_only; Unexpected trailing pipe

(MD055, table-pipe-style)


229-229: Table column count
Expected: 4; Actual: 2; Too few cells, row will be missing data

(MD056, table-column-count)

dump/src/test/bash/minidump-overview.md

48-48: Bare URL used

(MD034, no-bare-urls)

.github/ISSUE_TEMPLATE/data.md

11-11: Bare URL used

(MD034, no-bare-urls)


11-11: Bare URL used

(MD034, no-bare-urls)


11-11: Bare URL used

(MD034, no-bare-urls)


19-19: Bare URL used

(MD034, no-bare-urls)


19-19: Bare URL used

(MD034, no-bare-urls)


25-25: Blank line inside blockquote

(MD028, no-blanks-blockquote)


27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (47)
dump/src/test/resources/ci-tests/xsd_ci-tests.ttl (1)

55-55: Formatting adjustments look good.

Whitespace cleanup only; TTL semantics unaffected.

Also applies to: 60-60, 68-68, 74-74, 76-76, 84-84, 90-90

dump/src/test/bash/uris.lst (1)

44-44: Verify the test coverage purpose for this URL.

The addition of the Marian Breland Bailey Wikipedia URL appears valid. However, please confirm whether this URL was specifically chosen to test an edge case related to the PR's changes (e.g., broken brackets in abstracts, HTML extraction issues, or other scenarios).

If this URL tests a specific functionality, consider documenting the test purpose in related test files or comments to aid future maintenance.

dump/extraction.abstracts.properties (1)

32-32: LGTM! Configuration aligns with extractor refactoring.

The switch from AbstractExtractor to PlainAbstractExtractor is consistent with the broader refactoring across the codebase where extractors are being renamed and reorganized.

core/src/main/scala/org/dbpedia/extraction/mappings/HtmlAbstractExtractor.scala (1)

16-25: LGTM! Class rename improves clarity.

The rename from AbstractExtractorWikipedia to HtmlAbstractExtractor better describes the extractor's purpose and aligns with the parallel PlainAbstractExtractor naming.

dump/src/test/resources/shacl-tests/instances/Marian_Breland_Bailey.ttl (1)

1-21: LGTM! Well-formed SHACL test resource.

The SHACL NodeShape definition is syntactically correct and follows standard patterns for property validation testing. The structure aligns with other SHACL test resources in this PR.

dump/src/test/resources/shacl-tests/properties/dbp_abstract.ttl (1)

12-19: LGTM! Validation rule correctly detects broken brackets.

The SHACL pattern constraint uses a negative lookahead regex to reject abstracts containing the broken bracket pattern (;. This aligns well with the new removeBrokenBracketsInAbstracts utility introduced in the broader PR.

core/src/main/scala/org/dbpedia/extraction/config/Config.scala (3)

280-282: LGTM! Configuration properly extended for broken brackets cleanup.

The addition of removeBrokenBracketsProperty to AbstractParameters is well-structured and uses an appropriate property key (remove-broken-brackets-plain-abstracts) to distinguish it from the HTML variant.


297-299: LGTM! NIF parameters consistently extended.

The removeBrokenBracketsProperty field is consistently added to NifParameters with an appropriate property key (remove-broken-brackets-html-abstracts) that distinguishes HTML abstract processing from plain text processing.


353-355: LGTM! Case class definitions properly updated.

Both case class definitions are correctly updated with the new removeBrokenBracketsProperty: Boolean field, maintaining structural consistency across the configuration classes.

Also applies to: 374-381

core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala (4)

35-136: LGTM! Refactoring improves code structure.

The decomposition of the extract method with explicit condition variables (condition1_create_correspondingproperty, condition2_template_exists, etc.) and extracted helper methods significantly improves readability and maintainability.


138-159: LGTM! Helper method well-structured.

The createMissingTypes method cleanly encapsulates the logic for computing and emitting missing type triples when extending an existing resource with new types.


161-180: LGTM! Instance creation logic well-extracted.

The createInstance method appropriately handles type statement creation with proper dataset splitting between direct types and transitive types.


189-216: LGTM! URI generation logic clearly defined.

The generateUri method provides a clear strategy for generating URIs: prefer name-like properties, fall back to the first property, or use the template title as a last resort.

dump/src/test/bash/minidump-overview.md (1)

48-48: LGTM! New article added to test coverage.

The addition of the Marian_Breland_Bailey Wikipedia article aligns with the new SHACL test resources introduced in this PR and expands test coverage appropriately.

core/src/test/scala/org/dbpedia/extraction/mappings/NifExtractorTest.scala (1)

59-62: Post-processing now references PlainAbstractExtractor correctly.
Matches the production path and keeps the test aligned with the renamed extractor.

dump/src/main/scala/org/dbpedia/validation/construct/tests/generators/Queries.scala (2)

68-76: LGTM: Query expansion correctly supports new validator fields.

The SELECT clause and Optional bindings for validatorGroup and doesNotContains are correctly structured to capture the new validator metadata introduced by the ValidatorGroup enum. The Optional clauses ensure backward compatibility with existing test data that may not include these fields.


97-100: Improved readability with explicit query structure.

The reformatted query with explicit line breaks for each triple pattern enhances readability and maintainability without changing semantics.

core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala (2)

7-7: LGTM: Import added for new AbstractUtils utility.

The import supports the new broken-bracket cleaning functionality introduced in this file.


51-51: LGTM: Configuration-driven feature flag.

The removeBrokenBrackets flag from NifParameters enables optional cleaning of malformed bracket patterns in abstracts, consistent with the AbstractUtils functionality.

dump/src/main/scala/org/dbpedia/validation/construct/model/package.scala (1)

23-26: LGTM: ValidatorGroup enum is well-structured.

The new enumeration provides a clean grouping mechanism (RIGHT, LEFT, DEFAULT) for validator behavior, consistent with the existing enumeration patterns in this package object.

dump/src/main/scala/org/dbpedia/validation/construct/model/Construct.scala (1)

1-3: LGTM: Construct case class is well-designed.

The immutable data structure with optional left and right fields provides a clean abstraction for representing RDF construct validation inputs. The default None values for optional fields follow Scala idioms.

core/src/test/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractorTest.scala (2)

15-15: LGTM: Test class renamed to match refactored extractor.

The test class name now correctly reflects the renamed PlainAbstractExtractor class.


50-50: LGTM: Extractor instantiation updated correctly.

The instantiation now uses PlainAbstractExtractor, consistent with the class rename from AbstractExtractor.

dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties (1)

40-41: LGTM: Configuration updated for renamed extractor and new feature.

The changes correctly reflect:

  1. The extractor rename from .NifExtractor to .HtmlAbstractExtractor
  2. The new remove-broken-brackets-html-abstracts=true flag to enable broken-bracket cleaning in abstracts

These updates align with the code changes in WikipediaNifExtractor.scala and the introduction of AbstractUtils.removeBrokenBracketsInAbstracts.

dump/src/test/scala/org/dbpedia/extraction/dump/TestConfig.scala (1)

20-20: Approve resource loading. extraction-configs/extraction.plain.abstracts.properties exists under dump/src/test/resources and matches the expected test configuration.

dump/pom.xml (1)

18-19: LGTM: SHACL and CV testGroup properties correctly applied
All testGroup references in the dump module have been replaced; both Utils.getGroup("shaclTestGroup") and Utils.getGroup("cvTestGroup") usages align with the new properties.

dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericIRIValidator.scala (1)

5-5: LGTM! Validator refactor to Construct is well-executed.

The transition from String to Construct aligns with the broader validator refactor mentioned in the AI summary. The logic remains sound—nTriplePart.self correctly provides the IRI string for validation via Apache Jena's IRIResolver.checkIRI.

Also applies to: 13-15

dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericValidator.scala (1)

5-5: LGTM! Signature updated for consistency.

The method signature now accepts Construct to align with the validator refactor pattern. The no-op behavior (always returning true) remains appropriate for this placeholder validator.

Also applies to: 13-13

README.md (1)

79-81: LGTM! Inline code formatting for namespaces improves readability.

The use of backticks for namespace paths (org.dbpedia.extraction.*) enhances clarity and follows Markdown best practices.

dump/src/test/scala/org/dbpedia/extraction/dump/ShaclTest.scala (1)

33-33: LGTM! Refactor to centralized Utils is cleaner.

The migration from local helper methods to Utils.getGroup() and Utils.loadTestGroupsKeys() improves code reusability and aligns with the broader test utility refactor mentioned in the AI summary. The parameters passed to these methods are appropriate for SHACL test group filtering.

Also applies to: 117-117

dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericLiteralValidator.scala (1)

10-10: LGTM! Construct-based validation is correctly implemented.

The refactor from String to Construct is consistent with the broader validator pattern. The triple construction at line 25 correctly uses nTriplePart.self to extract the literal value for validation. The Apache Jena literal parser logic remains sound.

Also applies to: 23-25

dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTest.scala (1)

47-54: No change needed; extract already blocks until all jobs complete. The while (jobsRunning.size() > 0) loop (lines 98–101) guarantees the rename calls occur only after all futures finish.

server/server.default.properties (1)

59-59: LGTM! Configuration updated to reference renamed extractor.

The update from .AbstractExtractorWikipedia to .HtmlAbstractExtractor aligns with the class renaming in the codebase.

dump/src/main/scala/org/dbpedia/validation/construct/model/validators/generic/GenericRdfLangStringValidator.scala (1)

12-14: LGTM! Method signature refactored to use Construct.

The method signature change from String to Construct aligns with the broader validator refactor. The logic correctly accesses the RDF type token via nTriplePart.self.

dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties (1)

1-96: LGTM! Well-structured configuration for plain abstract extraction.

The configuration file properly sets up extraction parameters, MediaWiki API connection settings, and the new remove-broken-brackets-plain-abstracts feature. The settings align with the PlainAbstractExtractor implementation.

dump/src/main/scala/org/dbpedia/validation/construct/model/validators/Validator.scala (1)

3-24: LGTM! Validator trait updated to support Construct and ValidatorGroup.

The changes introduce:

  • VALIDATOR_GROUP field with a sensible default
  • Updated run method signature to accept Construct instead of String
  • toString override for better debugging

These changes align with the broader validator refactor across the codebase.

core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala (4)

33-42: LGTM! Class renamed and logger updated correctly.

The class rename from AbstractExtractor to PlainAbstractExtractor is consistently applied, including the logger initialization.


53-53: LGTM! Configuration flag added for broken-brackets removal.

The removeBrokenBrackets flag is properly initialized from the configuration.


83-92: LGTM! Conditional broken-brackets removal implemented correctly.

The conditional logic applies AbstractUtils.removeBrokenBracketsInAbstracts when the configuration flag is enabled, preserving the original text otherwise.


223-255: LGTM! Companion object renamed and patterns updated.

The companion object is correctly renamed to PlainAbstractExtractor, and the patternsToRemove list includes updated regex patterns.

dump/src/test/scala/org/dbpedia/extraction/dump/ConstructValidationTest.scala (2)

26-30: LGTM! Test initialization added in beforeAll.

The beforeAll method properly loads the test model using RDFDataMgr before tests execute, ensuring test data is available.


37-51: Verify the test-group filtering logic.

The filtering removes statements where:

  • The subject URI is in selectValues (derived from group keys)
  • The object matches the testGeneratorURI

Ensure this filtering correctly isolates the productive group tests as intended by the test name change.

You can verify the filtering behavior by examining the loaded group keys and confirming they match the intended test coverage.

dump/src/main/scala/org/dbpedia/validation/construct/model/validators/PatternValidator.scala (2)

6-11: LGTM! Constructor updated to support validator groups.

The validatorGroup parameter with a sensible default (ValidatorGroup.DEFAULT) allows flexible validator instantiation while maintaining backward compatibility.


13-26: LGTM! Pattern matching logic correctly handles validator groups.

The run method properly extracts values based on the validator group (RIGHT, LEFT, or DEFAULT) and applies the pattern. The TODO comment on Line 16 about potentially renaming "value" is noted but doesn't block the implementation.

dump/src/main/scala/org/dbpedia/validation/construct/model/validators/VocabValidator.scala (1)

1-36: Confirm no stale run(String) overrides or invocations remain
Project-wide search returned no matches for overriding or calling run with a String parameter; please manually verify no deprecated API usage remains.

dump/src/main/scala/org/dbpedia/validation/construct/tests/suites/NTripleTestSuite.scala (1)

100-101: Great context wiring for predicates

Wrapping the predicate as Construct(p, Some(s), Some(o)) keeps both neighbors available, so RIGHT/LEFT validator groups can reach the object and subject without extra lookups. Nice.

dump/src/test/resources/ci-tests/dbpedia-specific-ci-tests.ttl (1)

284-292: Nice use of validatorGroup for abstract literals

Assigning v:validatorGroup v:rightValidator to the datatype validator ensures only the object literal is checked for the broken bracket pattern—exactly what the new group support is for. Looks good.

>
> Therefore, you can use this tool to extract an example page and check if the error persists in the latest software version, and add the link you used for verification, e.g., http://dief.tools.dbpedia.org/server/extraction/en/extract?title=United+States
# Issue validity
> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/ we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap bare URLs in angle brackets or Markdown links.

Multiple bare URLs appear in the template. For better Markdown compliance and clickability, wrap them in angle brackets <URL> or use Markdown link syntax [text](URL).

Based on static analysis hints.

Apply this diff:

-> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into http://dbpedia.org/sparql . During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At http://dief.tools.dbpedia.org/server/extraction/en/  we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: http://dief.tools.dbpedia.org/server/extraction/en/
+> Some explanation: DBpedia Snapshot is produced every three months, see [Release Frequency & Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1), which is loaded into <http://dbpedia.org/sparql>. During these three months, Wikipedia changes and also the DBpedia Information Extraction Framework receives patches. At <http://dief.tools.dbpedia.org/server/extraction/en/> we host a daily updated extraction web service that can extract one Wikipedia page at a time. To check whether your issue is still valid, please enter the article name, e.g. `Berlin` or `Joe_Biden` here: <http://dief.tools.dbpedia.org/server/extraction/en/>
-* Web/SPARQL, e.g. http://dbpedia.org/sparql or http://dbpedia.org/resource/Berlin, please **provide query or link**
+* Web/SPARQL, e.g. <http://dbpedia.org/sparql> or <http://dbpedia.org/resource/Berlin>, please **provide query or link**

Also applies to: 19-19

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

11-11: Bare URL used

(MD034, no-bare-urls)


11-11: Bare URL used

(MD034, no-bare-urls)


11-11: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In .github/ISSUE_TEMPLATE/data.md around lines 11 and 19, there are multiple
bare URLs that should be wrapped for proper Markdown rendering; replace each
bare URL (e.g. https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1,
http://dbpedia.org/sparql, http://dief.tools.dbpedia.org/server/extraction/en/)
with either angle-bracketed form <URL> or convert to Markdown links like
[Release Frequency &
Schedule](https://www.dbpedia.org/blog/snapshot-2021-06-release/#anchor1) and
similarly for the SPARQL and DIEF extractor URLs so they render and are
clickable.

> Pick the appropriate:
# Details
> please post the details
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove blank line inside blockquote.

Line 25 contains a blank line within a blockquote, which breaks Markdown formatting. Remove the blank line or replace it with > to maintain the blockquote.

Based on static analysis hints.

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

25-25: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🤖 Prompt for AI Agents
.github/ISSUE_TEMPLATE/data.md around line 25: there is a blank line inside a
blockquote which breaks Markdown rendering; remove the blank line (or replace it
with a '>' prefix) so the blockquote lines are contiguous and properly
formatted, ensuring the blockquote marker is present on the next line if you
want an empty quoted line.

Comment on lines 27 to 33
```
```
- Expected / corrected RDF outcome snippet (NTRIPLES):
> Expected / corrected RDF outcome snippet
```
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifiers to fenced code blocks.

Fenced code blocks should specify a language for proper syntax highlighting. For RDF/SPARQL snippets, use turtle or sparql; for generic examples, use text.

Based on static analysis hints.

Apply this diff:

 > Wrong triples RDF snippet 
-  ``` 
+  ```turtle
   

Expected / corrected RDF outcome snippet

Example DBpedia resource URL(s)
- +text

- +



Also applies to: 35-37

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

27-27: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

.github/ISSUE_TEMPLATE/data.md around lines 27-33 (and similarly lines 35-37):
fenced code blocks lack language specifiers; update the opening triple-backtick
lines to include appropriate languages (use turtle for RDF/SPARQL snippets and text for generic examples) so the blocks become turtle or text
respectively, keeping the closing ``` unchanged.


</details>

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +80 to +83
//if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different
//entity and need to be saved as new resource.
//This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each
//about different entity, see dbr:Helene_Demuth
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Fix grammar issues in comment.

The comment has several grammatical errors that should be corrected for clarity.

Apply this diff:

-        //if we have more than one info boxes and name property of infobox is different than title of page than infobox belong to different
-        //entity and need to be saved as new resource.
-        //This is to avoid assigning of one entity properties to another entity because a single page can have multiple info boxes and each
-        //about different entity, see dbr:Helene_Demuth
+        // If we have more than one infobox and the name property of an infobox is different from the page title, then the infobox belongs to a different
+        // entity and needs to be saved as a new resource.
+        // This is to avoid assigning one entity's properties to another entity, because a single page can have multiple infoboxes, each
+        // about a different entity. See dbr:Helene_Demuth for an example.

Note: This partially addresses a previous review comment.

🤖 Prompt for AI Agents
In core/src/main/scala/org/dbpedia/extraction/mappings/TemplateMapping.scala
around lines 80 to 83, the block comment contains multiple grammatical errors;
replace it with a clear, corrected comment such as: "If there is more than one
infobox and the infobox name differs from the page title, the infobox describes
a different entity and should be saved as a new resource. This prevents
assigning properties from one entity to another when a single page contains
multiple infoboxes about different entities (see dbr:Helene_Demuth)." Update the
file to use this corrected wording.

Comment on lines 8 to +27
private val pattern = patternString.r.pattern

override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL

override def run(nTriplePart: String): Boolean = {

val lexicalForm = nTriplePart.trim.split("\"").dropRight(1).drop(1).mkString("")

override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup

override def run(nTriplePart: Construct): Boolean = {
val lexicalForm = VALIDATOR_GROUP match {
case ValidatorGroup.RIGHT => nTriplePart.right match {
// TODO: 1) maybe we need to rename "value"
// 2) discuss what to do if we want to check the value that doesn't exist on
// the left or right side, at the moment we only return false in these cases
case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
case None => return false
}
case ValidatorGroup.LEFT => nTriplePart.left match {
case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
case None => return false
}
case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make lexical form extraction robust (escaped quotes) and simplify run()

Current split-based parsing can fail on escaped quotes within literals. Use a regex capture and Option flow; also avoid wildcard group match.

   private val pattern = patternString.r.pattern
+  // Extracts literal lexical form between the first pair of quotes, handling escaped quotes.
+  private val lexicalRegex = "^\\s*\\\"((?:\\\\.|[^\\\"\\\\])*)\\\"".r
 
   override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL
   override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup
 
-  override def run(nTriplePart: Construct): Boolean = {
-    val lexicalForm = VALIDATOR_GROUP match {
-      case ValidatorGroup.RIGHT => nTriplePart.right match {
-        // TODO: 1) maybe we need to rename "value"
-        //       2) discuss what to do if we want to check the value that doesn't exist on
-        //          the left or right side, at the moment we only return false in these cases
-        case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
-        case None => return false
-      }
-      case ValidatorGroup.LEFT => nTriplePart.left match {
-        case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
-        case None => return false
-      }
-      case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("")
-    }
-    pattern.matcher(lexicalForm).matches()
-  }
+  override def run(nTriplePart: Construct): Boolean = {
+    val selected: Option[String] = VALIDATOR_GROUP match {
+      case ValidatorGroup.RIGHT   => nTriplePart.right
+      case ValidatorGroup.LEFT    => nTriplePart.left
+      case ValidatorGroup.DEFAULT => Some(nTriplePart.self)
+    }
+    val lexicalFormOpt = selected.flatMap(s => lexicalRegex.findFirstMatchIn(s.trim).map(_.group(1)))
+    lexicalFormOpt.exists(s => pattern.matcher(s).matches())
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private val pattern = patternString.r.pattern
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL
override def run(nTriplePart: String): Boolean = {
val lexicalForm = nTriplePart.trim.split("\"").dropRight(1).drop(1).mkString("")
override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup
override def run(nTriplePart: Construct): Boolean = {
val lexicalForm = VALIDATOR_GROUP match {
case ValidatorGroup.RIGHT => nTriplePart.right match {
// TODO: 1) maybe we need to rename "value"
// 2) discuss what to do if we want to check the value that doesn't exist on
// the left or right side, at the moment we only return false in these cases
case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
case None => return false
}
case ValidatorGroup.LEFT => nTriplePart.left match {
case Some(value) => value.trim.split("\"").dropRight(1).drop(1).mkString("")
case None => return false
}
case _ => nTriplePart.self.trim.split("\"").dropRight(1).drop(1).mkString("")
}
private val pattern = patternString.r.pattern
// Extracts literal lexical form between the first pair of quotes, handling escaped quotes.
private val lexicalRegex = "^\\s*\\\"((?:\\\\.|[^\\\"\\\\])*)\\\"".r
override val METHOD_TYPE: ValidatorType.Value = ValidatorType.TYPED_LITERAL
override val VALIDATOR_GROUP: ValidatorGroup.Value = validatorGroup
override def run(nTriplePart: Construct): Boolean = {
val selected: Option[String] = VALIDATOR_GROUP match {
case ValidatorGroup.RIGHT => nTriplePart.right
case ValidatorGroup.LEFT => nTriplePart.left
case ValidatorGroup.DEFAULT => Some(nTriplePart.self)
}
val lexicalFormOpt = selected.flatMap(s => lexicalRegex.findFirstMatchIn(s.trim).map(_.group(1)))
lexicalFormOpt.exists(s => pattern.matcher(s).matches())
}
🤖 Prompt for AI Agents
In
dump/src/main/scala/org/dbpedia/validation/construct/model/validators/TypedLiteralValidator.scala
around lines 8 to 27, the current lexicalForm extraction uses split("\"") which
breaks on escaped quotes and uses a catch-all ValidatorGroup match; replace this
with an explicit match on ValidatorGroup.RIGHT and ValidatorGroup.LEFT (no
wildcard), pull the literal text from the appropriate Option safely, and extract
the quoted lexical form using a regex with a capture group that allows escaped
quotes (e.g. a pattern matching a leading quote, any sequence of non-unescaped
or escaped characters, then a closing quote) so you avoid splitting; propagate
Option properly (return false on None) and assign the captured group as
lexicalForm.

Comment on lines +15 to +30
val filePath = classLoader.getResource(path).getFile
val file = scala.io.Source.fromFile(filePath)

val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
val columnsNames: Array[String] = table.head

if (!columnsNames.contains(group)) {
Array[String]()
}
else {
val indexOfGroup = columnsNames.indexOf(group)
val groupsKeys: Array[String] = table.tail.flatMap(row =>
if (row(indexOfGroup) == option) Array[String](row(0))
else Array[String]())
groupsKeys
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix resource handling in loadTestGroupsKeys (leak) and hardening

Use getResourceAsStream, ensure streams are closed, and guard empty/short rows.

-    val filePath = classLoader.getResource(path).getFile
-    val file = scala.io.Source.fromFile(filePath)
-
-    val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
-    val columnsNames: Array[String] = table.head
-
-    if (!columnsNames.contains(group)) {
-      Array[String]()
-    }
-    else {
-      val indexOfGroup = columnsNames.indexOf(group)
-      val groupsKeys: Array[String] = table.tail.flatMap(row =>
-        if (row(indexOfGroup) == option) Array[String](row(0))
-        else Array[String]())
-      groupsKeys
-    }
+    val resource = Option(classLoader.getResourceAsStream(path))
+      .getOrElse(throw new IllegalArgumentException(s"Resource not found: $path"))
+    val source = scala.io.Source.fromInputStream(resource)(scala.io.Codec.UTF8)
+    try {
+      val table: Array[Array[String]] = source.getLines().map(_.split(",")).toArray
+      if (table.isEmpty) Array.empty[String]
+      else {
+        val columnsNames: Array[String] = table.head
+        if (!columnsNames.contains(group)) Array.empty[String]
+        else {
+          val indexOfGroup = columnsNames.indexOf(group)
+          table.tail.collect {
+            case row if row.length > indexOfGroup && row(indexOfGroup) == option => row(0)
+          }.toArray
+        }
+      }
+    } finally {
+      source.close()
+      resource.close()
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val filePath = classLoader.getResource(path).getFile
val file = scala.io.Source.fromFile(filePath)
val table: Array[Array[String]] = file.getLines().map(_.split(",")).toArray
val columnsNames: Array[String] = table.head
if (!columnsNames.contains(group)) {
Array[String]()
}
else {
val indexOfGroup = columnsNames.indexOf(group)
val groupsKeys: Array[String] = table.tail.flatMap(row =>
if (row(indexOfGroup) == option) Array[String](row(0))
else Array[String]())
groupsKeys
}
val resource = Option(classLoader.getResourceAsStream(path))
.getOrElse(throw new IllegalArgumentException(s"Resource not found: $path"))
val source = scala.io.Source.fromInputStream(resource)(scala.io.Codec.UTF8)
try {
val table: Array[Array[String]] = source.getLines().map(_.split(",")).toArray
if (table.isEmpty) Array.empty[String]
else {
val columnsNames: Array[String] = table.head
if (!columnsNames.contains(group)) Array.empty[String]
else {
val indexOfGroup = columnsNames.indexOf(group)
table.tail.collect {
case row if row.length > indexOfGroup && row(indexOfGroup) == option => row(0)
}.toArray
}
}
} finally {
source.close()
resource.close()
}
🤖 Prompt for AI Agents
In dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala around lines 15
to 30, replace the use of classLoader.getResource(...).getFile and
scala.io.Source.fromFile (which leaks) with
classLoader.getResourceAsStream(path) and wrap the InputStream in a safe closing
construct (try/finally or scala.util.Using) to ensure the stream is closed;
guard against getResourceAsStream returning null and return an empty Array when
resource is missing; parse lines using Source.fromInputStream, skip empty lines,
defensively check row length before indexing (row.length > indexOfGroup &&
row.nonEmpty) and handle empty table or missing header by returning an empty
Array[String] rather than throwing.

Comment on lines +34 to +45
val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
val properties = new Properties()
resourceInputStream match {
case Some(inputStream) => properties.load(inputStream)
case None => return TestConfig.defaultTestGroup
}
val groupOption = Option(properties.getProperty(testName))
groupOption match {
case Some(group) => group
case None => TestConfig.defaultTestGroup
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Close properties InputStream to avoid leak

Ensure the stream is closed; simplify defaulting.

-    val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
-    val properties = new Properties()
-    resourceInputStream match {
-      case Some(inputStream) => properties.load(inputStream)
-      case None => return TestConfig.defaultTestGroup
-    }
-    val groupOption = Option(properties.getProperty(testName))
-    groupOption match {
-      case Some(group) => group
-      case None => TestConfig.defaultTestGroup
-    }
+    val isOpt = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
+    isOpt match {
+      case Some(is) =>
+        val properties = new Properties()
+        try properties.load(is) finally is.close()
+        Option(properties.getProperty(testName)).getOrElse(TestConfig.defaultTestGroup)
+      case None =>
+        TestConfig.defaultTestGroup
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val resourceInputStream = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
val properties = new Properties()
resourceInputStream match {
case Some(inputStream) => properties.load(inputStream)
case None => return TestConfig.defaultTestGroup
}
val groupOption = Option(properties.getProperty(testName))
groupOption match {
case Some(group) => group
case None => TestConfig.defaultTestGroup
}
}
val isOpt = Option(getClass.getClassLoader.getResourceAsStream("properties-from-pom.properties"))
isOpt match {
case Some(is) =>
val properties = new Properties()
try properties.load(is) finally is.close()
Option(properties.getProperty(testName)).getOrElse(TestConfig.defaultTestGroup)
case None =>
TestConfig.defaultTestGroup
}
🤖 Prompt for AI Agents
In dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala around lines 34
to 45, the InputStream from getResourceAsStream is not closed and the properties
defaulting is verbose; fix by: check for null resource stream, load properties
inside a try-finally (or use scala.util.Using) that closes the stream after
loading, then simplify the return by using
Option(properties.getProperty(testName)).getOrElse(TestConfig.defaultTestGroup)
so the stream is always closed and the defaulting is concise.

Comment on lines +48 to +59
val minidumpDir = new File("./target/minidumptest/base")
minidumpDir.listFiles().foreach(f => {
val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
if (longAbstractsFile.exists()) {
longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
}
val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
if (shortAbstractsFile.exists()) {
shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
}
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard listFiles() and check rename results

Prevent NPE when base dir is absent and surface rename failures.

-    val minidumpDir = new File("./target/minidumptest/base")
-    minidumpDir.listFiles().foreach(f => {
+    val minidumpDir = new File("./target/minidumptest/base")
+    Option(minidumpDir.listFiles()).getOrElse(Array.empty[File]).foreach { f =>
       val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
       if (longAbstractsFile.exists()) {
-        longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
+        val ok = longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
+        if (!ok) System.err.println(s"WARN: Could not rename: ${longAbstractsFile.getPath}")
       }
       val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
       if (shortAbstractsFile.exists()) {
-        shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
+        val ok = shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
+        if (!ok) System.err.println(s"WARN: Could not rename: ${shortAbstractsFile.getPath}")
       }
-    })
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val minidumpDir = new File("./target/minidumptest/base")
minidumpDir.listFiles().foreach(f => {
val longAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
if (longAbstractsFile.exists()) {
longAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
}
val shortAbstractsFile = new File( s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
if (shortAbstractsFile.exists()) {
shortAbstractsFile.renameTo(new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
}
})
}
val minidumpDir = new File("./target/minidumptest/base")
Option(minidumpDir.listFiles()).getOrElse(Array.empty[File]).foreach { f =>
val longAbstractsFile = new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts.ttl.bz2")
if (longAbstractsFile.exists()) {
val ok = longAbstractsFile.renameTo(
new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-long-abstracts-$datasetName.ttl.bz2"))
if (!ok) System.err.println(s"WARN: Could not rename: ${longAbstractsFile.getPath}")
}
val shortAbstractsFile = new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts.ttl.bz2")
if (shortAbstractsFile.exists()) {
val ok = shortAbstractsFile.renameTo(
new File(s"./target/minidumptest/base/${f.getName}/$date/${f.getName}-$date-short-abstracts-$datasetName.ttl.bz2"))
if (!ok) System.err.println(s"WARN: Could not rename: ${shortAbstractsFile.getPath}")
}
}
🤖 Prompt for AI Agents
In dump/src/test/scala/org/dbpedia/extraction/dump/Utils.scala around lines 48
to 59, guard against a null result from minidumpDir.listFiles() to prevent a
NullPointerException when the base directory is missing, and check the boolean
return value of File.renameTo(...) to detect and log or fail on rename failures;
specifically, only iterate when listFiles() != null (or use Option/empty list),
and after each renameTo call verify its result and handle failure (e.g., throw
an exception or call fail/log with the source and target paths).

- Get help with the [Maven build](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html) or another form of [installation](https://maven.apache.org/install.html).
- [Download](https://dumps.wikimedia.org/) some data to work with.
- How to run [from Scala/Java](https://docs.scala-lang.org/tutorials/scala-with-maven.html) or [from a JAR](https://docs.oracle.com/javase/tutorial/deployment/jar/run.html).
- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Wrap bare URL in angle brackets or Markdown link.

Line 125 contains a bare URL (https://forum.dbpedia.org) that should be wrapped for better Markdown compliance.

Based on static analysis hints.

Apply this diff:

-- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org.
+- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on <https://forum.dbpedia.org>.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on https://forum.dbpedia.org.
- Having different troubles? Check the [troubleshooting page](https://maven.apache.org/users/getting-help.html) or post on <https://forum.dbpedia.org>.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

125-125: Bare URL used

(MD034, no-bare-urls)

🤖 Prompt for AI Agents
In README.md around line 125, there's a bare URL (https://forum.dbpedia.org)
that needs to be wrapped for proper Markdown formatting; replace the bare URL
with either a Markdown link text like [DBpedia forum](https://forum.dbpedia.org)
or wrap it in angle brackets <https://forum.dbpedia.org> so the link is
rendered/clickable and compliant with Markdown linting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants