-
Notifications
You must be signed in to change notification settings - Fork 287
Dbpedia History #745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: history-extraction
Are you sure you want to change the base?
Dbpedia History #745
Conversation
…ctorConfigured.scala Co-authored-by: Ted Thibodeau Jr <[email protected]>
…Abstract.md Co-authored-by: Ted Thibodeau Jr <[email protected]>
…Abstract.scala Co-authored-by: Ted Thibodeau Jr <[email protected]>
…Abstract.scala Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
…Abstract.md Co-authored-by: Ted Thibodeau Jr <[email protected]>
…Abstract.md Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
…ractionJob2.scala Co-authored-by: Ted Thibodeau Jr <[email protected]>
…ractionJob2.scala Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
…figLoader2.scala Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
|
Kudos, SonarCloud Quality Gate passed! |
TallTed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor...
Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
Co-authored-by: Ted Thibodeau Jr <[email protected]>
|
Please retry analysis of this Pull-Request directly on SonarCloud. |
1 similar comment
|
Please retry analysis of this Pull-Request directly on SonarCloud. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
Caution Review failedFailed to post review comments WalkthroughAdds a new History module for revision-based extraction (config, sources, extractors, Spark pipeline, recorder/monitor), introduces REST-based NIF extraction path with new MediaWiki connectors, updates configurations (including MediaWikiConnection signature and defaults), extends datasets (history), adds dump test tooling/scripts, and enriches multiple mapping files. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Runner as NifExtractor
participant Conf as Config.mediawikiConnection
participant REST as MediaWikiConnectorRest
participant MWC as MediawikiConnectorConfigured
participant ExtrR as WikipediaNifExtractorRest
participant Extr as WikipediaNifExtractor
alt apiType == "rest"
Runner->>REST: retrievePage(title, params)
REST-->>Runner: html opt or error
Runner->>ExtrR: extract(html)
ExtrR-->>Runner: NIF sections/triples
else non-REST
Runner->>MWC: retrievePage(title, params)
MWC-->>Runner: html opt or error
Runner->>Extr: extract(html)
Extr-->>Runner: NIF sections/triples
end
sequenceDiagram
autonumber
participant Main as Extraction2
participant Cfg as Config2
participant Loader as ConfigLoader2
participant Job as ExtractionJob2
participant Src as Source2/XMLSource2
participant Ext as CompositeParseExtractor2
participant Dest as WriterDestination2
participant Rec as ExtractionRecorder2
Main->>Cfg: load(args)
Main->>Loader: new(config)
Loader-->>Main: jobs (by language)
loop per job
Main->>Job: run()
Job->>Src: foreach(page)
Src-->>Job: WikiPageWithRevisions
Job->>Ext: extract(page, subjectUri)
Ext-->>Job: quads
Job->>Dest: write(quads)
Job->>Rec: record page/quad status
end
Main-->>Main: wait for all jobs
Estimated code review effort🎯 5 (Critical) | ⏱️ ~180+ minutes Poem
Pre-merge checks and finishing touches and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 53
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
mappings/Mapping_hu.xml (1)
265-266: Fix incorrect ontology targets for church founding metadata.
alapítvanow writes todbo:publicationDateandalapítótodbo:bodyDiscovered, which will pollute those properties with church founding data. Please map them to the standard founding predicates instead.- {{PropertyMapping | templateProperty = alapítva | ontologyProperty = publicationDate }} - {{PropertyMapping | templateProperty = alapító | ontologyProperty = bodyDiscovered }} + {{PropertyMapping | templateProperty = alapítva | ontologyProperty = foundingDate }} + {{PropertyMapping | templateProperty = alapító | ontologyProperty = founder }}
🧹 Nitpick comments (39)
dump/src/test/bash/createSampleRandomFromPageIDdataset.sh (1)
21-22: Quote parameters passed to external commands.
$file,$n, and the constructed output name are expanded unquoted. Any whitespace or glob characters in the file path or generated list name will break the pipeline or trigger unintended globbing. Quote those substitutions before passing them togrep,shuf, and the final redirection target.-grep -v "resource\/\w*\:" $file > temp.txt -shuf -n $n temp.txt | ... > uri_sample_random_${lang}_${n}.lst +grep -v "resource/\w*:" "$file" > temp.txt +shuf -n "$n" temp.txt | ... > "uri_sample_random_${lang}_${n}.lst"pom.xml (1)
64-65: History module is commented out—clarify activation planIf the history module should build in CI, uncomment it or gate it behind a Maven profile (e.g., -P history). Otherwise, document why it’s intentionally disabled to avoid confusion.
history/ReadMe.md (7)
3-3: Tighten intro sentenceMinor copy edit and trailing space.
-DBpedia History enables the history of a Wikipedia chapter to be extracted into an RDF format +DBpedia History enables the history of a Wikipedia chapter to be extracted into RDF.
10-15: Avoid bare URLs; format citations cleanly (MD034)Replace separate bare URLs with Markdown links.
-Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin. Wikipedia editing history in DBpedia: extracting and publishing the encyclopedia editing activity as linked data. IEEE/WIC/ACM International Joint Conference on Web Intelligence (WI' 16), Oct 2016, Omaha, United States. <hal-01359575> -https://hal.inria.fr/hal-01359575 - -Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin. Materializing the editing history of Wikipedia as linked data in DBpedia. ISWC 2016 - 15th International Semantic Web Conference, Oct 2016, Kobe, Japan. <http://iswc2016.semanticweb.org/>. <hal-01359583> -https://hal.inria.fr/hal-01359583 +– [Wikipedia editing history in DBpedia: extracting and publishing the encyclopedia editing activity as linked data (WI'16)](https://hal.inria.fr/hal-01359575) — Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin. + +– [Materializing the editing history of Wikipedia as linked data in DBpedia (ISWC 2016)](https://hal.inria.fr/hal-01359583) — Fabien Gandon, Raphael Boyer, Olivier Corby, Alexandre Monnin.
36-43: Fix typos, grammar, and list indentation (MD007)Tighten phrasing and remove extra indent spaces before bullets.
- * Extract all revisions of every Wikipedia page - * Use the foaf, xsd, rdf, prov, dc, sioc ontologies - * Describre each revisions of each page, the content / date / size / importance of that revision, the author of this one and the delta with the last version of the page updated by this one - * the id of the user are based depending on what is available : ip / nickname or the wikipedia id +* Extract all revisions of every Wikipedia page +* Use the FOAF, XSD, RDF, PROV, DC, and SIOC ontologies +* Describe each revision of each page: content, date, size, importance, author, and the delta vs. the previous revision +* The user identifier is based on what is available: IP address, username, or Wikipedia user ID
44-50: Clarify wording in HistoryStats bullets (grammar, MD007)Improve list grammar and remove extra indent spaces.
- * Extract statistics about revision activity for every page of Wikipedia : - * number of revision per year / months - * avg size of revision per year / month - * number of unique contribution - * This exctraction add some computation and could be not necessary - * Use dc, rdf, rdfs ontologies +* Extract statistics about revision activity for every Wikipedia page: + * number of revisions per year/month + * average revision size per year/month + * number of unique contributors +* This extraction adds additional computation and may be optional +* Uses DC, RDF, and RDFS ontologies
53-64: Minor heading/punctuation fixesStandardize headings and code formatting.
-## How to run it ? +## How to run it? @@ -* and run ```../run download download.properties``` +* and run: `../run download download.properties` @@ -* and run ```../run run extraction.properties``` +* and run: `../run run extraction.properties`
71-86: Specify language for fenced code block (MD040)These are N-Triples; add a language hint for better rendering.
-``` +```ntriples ...--- `88-182`: **Specify language for fenced code block (MD040)** Same for the stats example. ```diff -``` +```ntriples ...</blockquote></details> <details> <summary>dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTestAbstract.scala (2)</summary><blockquote> `97-101`: **Remove unused debug variable/print** `parallelProcesses` isn’t used; drop it and the println. ```diff - val parallelProcesses = 1 - println(parallelProcesses) + // parallelism is determined by job config; keep output minimal in tests
140-141: Don’t swallow exceptions silentlyLog or record the exception to aid debugging.
- } catch { - case e: Exception => None - } + } catch { + case e: Exception => + println(s"[warn] Failed to list or aggregate failed pages for ${lang.wikiCode}: ${e.getMessage}") + }history/src/main/scala/org/dbpedia/extraction/dump/extract/SparkExtractionJob2.scala (3)
49-50: Remove stray debug printLeftover println adds noise in Spark driver logs.
- println("SPARKKKK")
70-73: Use consistent composite extractor implementationFile uses CompositeParseExtractor here and CompositeParseExtractor2 later; unify to the intended variant.
- val extractor = CompositeParseExtractor.load(extractors, context) + val extractor = CompositeParseExtractor2.load(extractors, context)
211-218: Avoid external bash for concat if possibleRelying on an external script can break in cluster environments. Consider merging via Hadoop/Spark APIs (e.g., coalesce + save) or Java NIO, with retries and logging.
core/src/main/resources/datasetdefinitions.json (1)
629-642: Update "history" dataset names and descriptions"history_dataset": { - "name": "History Links", + "name": "History Data", "traits":"LinkedData, Published", - "desc": "All data related to history", + "desc": "Per-revision history data extracted from Wikipedia", "defaultgraph": "dataset" }, "history_stats": { "name": "History Stats", "traits":"LinkedData, Published", - "desc": "Statistics related to edition statistics", + "desc": "Aggregated editing statistics per page (e.g., revisions per year/month, average sizes, unique contributors)", "defaultgraph": "dataset" }Scala constants
HistoryDataandHistoryStatsalready map to these keys.history/download.properties (2)
4-4: Remove commented debug/test URL.Line 4 contains a commented-out URL that appears to be for testing/debugging. This should be removed from production configuration files.
-#https://fr.wikipedia.org/w/index.php?title=Sp%C3%A9cial:Exporter&action=submit&history=1&pages=H%C3%B4tes_de_passage%0ARaymond_Beaudet
15-15: Document or parameterize the hardcoded dump date.The dump date is hardcoded to
20221001, which will become outdated. The comment on line 14 states "if empty: the most recent dump-date is used."Consider either:
- Leaving it empty to automatically use the most recent dump
- Adding documentation explaining why this specific date is required
- Making it configurable via environment variable or build parameter
history/src/main/scala/org/dbpedia/extraction/mappings/CompositeExtractor2.scala (1)
7-7: Address or clarify the TODO comment.The TODO comment "generic type may not be optimal" lacks context. Either:
- Elaborate on specific concerns with the generic type approach
- Remove the TODO if the current design is acceptable
- File an issue to track the concern if it's deferred work
If the generic type is working as intended, consider replacing with a clarifying comment:
- * TODO: generic type may not be optimal. + * Generic composite extractor that aggregates multiple extractors of the same input type. + * Type parameter N allows composition of extractors for different node types (WikiPageWithRevisions, RevisionNode, etc.).history/src/main/scala/org/dbpedia/extraction/mappings/CompositeWikiPageWithRevisionExtractor.scala (1)
5-7: Align extends/with indentation.Scala convention keeps
extends/withindented under the class header for readability. Please indent these two lines to match surrounding code.dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties (1)
34-34: Trailing space in languages valueTrim to avoid parsing surprises in loaders that don’t auto-trim.
-languages=fr +languages=frhistory/src/main/scala/org/dbpedia/extraction/mappings/RedirectExtractor2.scala (1)
28-30: Remove unused namespaces valDeclared but never used. Drop to reduce noise.
- private val namespaces = if (language == Language.Commons) ExtractorUtils.commonsNamespacesContainingMetadata - else Set(Namespace.Main, Namespace.Template, Namespace.Category) + // Intentionally no namespace set here; see extract() filter.core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractorRest.scala (2)
93-96: Link section relationships (top/sub and prev/next) like base extractorParity with WikipediaNifExtractor improves consumers relying on navigation.
- tocMap.append(section) + tocMap.append(section) + section.top match { + case Some(s) => s.sub = Option(section) + case None => () + } + section.prev match { + case Some(s) => s.next = Option(section) + case None => () + }
45-47: Readability: avoid shadowing parameter name in lambdaRename the lambda variable to prevent confusion with method param currentNodes.
- subnodes = subnodes.dropWhile(currentNodes => !currentNodes.nodeName().matches("h\\d") && !currentNodes.nodeName().matches("section")) + subnodes = subnodes.dropWhile(n => !n.nodeName().matches("h\\d") && !n.nodeName().matches("section"))history/src/main/scala/org/dbpedia/extraction/wikiparser/RevisionNode.scala (3)
49-57: Avoid reparsing timestamp twice.
getYearMonthparses the ISO timestamp twice. Cache theZonedDateTimelocally (or reuse the instance fromgetYear) to avoid the duplicate parse and associated exception risk when the timestamp is malformed twice.
60-64: Tighten boolean parsing.
parseBooleantreats any non-empty string astrue, so"True"or"yes"behave differently than"false", and typos silently coerce totrue. Consider normalizing withstr.equalsIgnoreCase("true")(and trimming) so unexpected inputs yieldfalse(or throw).
66-73: ReturnOptioninstead of sentinel-1.Returning
-1for missing numeric fields leaks sentinel values into downstream logic. PreferOption[Int]/Option[Long]or throw on malformed input to keep the API type-safe.history/src/main/scala/org/dbpedia/extraction/mappings/ExtractionMonitor2.scala (2)
135-141: Return numeric successful count, not AtomicLongSummaries generally expect values, not mutables.
Apply:
- summary.put("SUCCESSFUL", s) + summary.put("SUCCESSFUL", Long.box(s.get()))
183-185: Replace printStackTrace with proper loggingAvoid printing to stdout in libraries.
Use a logger (java.util.logging or slf4j) with context and stack trace:
// logger.log(Level.WARNING, s"DATAID compare failed for $dataIDUrl", ex)history/src/main/scala/org/dbpedia/extraction/config/Config2.scala (1)
260-281: mediawikiConnection: handle unknown mwc-type explicitlyUnknown values cause a MatchError that is wrapped and rethrown. Prefer explicit validation for clearer error messages.
Example:
- apiUrl = this.getProperty("mwc-type").trim match { + apiUrl = this.getProperty("mwc-type", "").trim match { case "rest" => this.getProperty("mwc-apiRestUrl", "").trim case "mwc" => this.getProperty("mwc-apiMWCUrl", "").trim case "local" => this.getProperty("mwc-apiLocalUrl", "").trim + case other => throw new IllegalArgumentException(s"Unsupported mwc-type='$other'. Expected one of: rest|mwc|local.") },history/src/main/scala/org/dbpedia/extraction/wikiparser/WikiPageWithRevisions.scala (2)
50-57: Typo: isRetryy -> isRetryMinor naming nit for consistency.
Apply:
- private var isRetryy = false + private var isRetry = false ... - this.isRetryy = !this.isRetryy + this.isRetry = !this.isRetry ... - def isRetry: Boolean = this.isRetryy + def isRetry: Boolean = this.isRetry
96-98: Avoid mapValues (lazy view) and return strict mapsmapValues returns a view in newer Scala; compute eagerly.
Apply:
- def getRevPerYear: Map[String,Int] = { - this.revisions.groupBy(_.getYear).mapValues(_.size) - } + def getRevPerYear: Map[String,Int] = + this.revisions.groupBy(_.getYear).map { case (k, v) => k -> v.size }.toMap ... - def getRevPerYearMonth: Map[String, Int] = { - this.revisions.groupBy(_.getYearMonth).mapValues(_.size) - } + def getRevPerYearMonth: Map[String, Int] = + this.revisions.groupBy(_.getYearMonth).map { case (k, v) => k -> v.size }.toMap ... - def getRevPerYearMonthAvgSize: Map[String, Int] = { - this.revisions.groupBy(_.getYearMonth).map { - case (candidate, group) => - candidate -> group.map { _.text_size }.sum / group.size - } - } + def getRevPerYearMonthAvgSize: Map[String, Int] = + this.revisions.groupBy(_.getYearMonth).map { case (k, g) => k -> (g.map(_.text_size).sum / g.size) }.toMapAlso applies to: 107-109, 111-116
core/src/main/scala/org/dbpedia/extraction/util/MediawikiConnectorConfigured.scala (2)
60-67: Parameter construction/escaping is fragile; prefer proper URL-encoding and clearer templating.Manual replacements on title and ad-hoc replace/format on apiParameterString are error-prone. Use URLEncoder.encode for values and build parameters with a small helper to avoid mismatched %s/%d templates and accidental deletions.
- Build a Map[String,String] of parameters (uselang, page or pageid, maxlag).
- URL-encode only values with UTF-8.
- Join with '&' to a body string.
I can provide a helper if you want it in this PR.Also applies to: 50-53
26-28: Remove unused vars and null Try; simplify.
- availableProcessors, gzipok are unused.
- parsedAnswer should not be initialized to null.
Apply:
- private val osBean = java.lang.management.ManagementFactory.getOperatingSystemMXBean - private val availableProcessors = osBean.getAvailableProcessors + // remove unused osBean/availableProcessors - var gzipok = true - var parsedAnswer: Try[String] = null + var parsedAnswer: Try[String] = Failure(new IllegalStateException("uninitialized"))Also drop gzipok and guard gzip by Content-Encoding as suggested above.
Also applies to: 41-43, 107-115
history/src/main/scala/org/dbpedia/extraction/util/ExtractionRecorder2.scala (4)
220-223: Don’t increment “successful pages” for quads; use a dedicated triples counter.recordQuad invokes increaseAndGetSuccessfulPages, skewing page stats. Use the existing successfulTripleCount keyed by quad.dataset.
Apply:
- def recordQuad(quad: Quad, severity: RecordSeverity.Value, lang:Language): Unit = synchronized { - if(increaseAndGetSuccessfulPages(lang) % reportInterval == 0) + def recordQuad(quad: Quad, severity: RecordSeverity.Value, lang: Language): Unit = synchronized { + val triples = increaseAndGetSuccessfulTriples(quad.dataset) + if (triples % reportInterval == 0) printLabeledLine("processed {page} quads; {mspp} per quad; {fail} failed quads", severity, lang) }Optionally adapt placeholders to reflect triple count and ms/triple.
323-339: Avoid finalize; provide explicit close and implement AutoCloseable.finalize is deprecated and unreliable. Expose a close() (or closeWriter()) and let callers use try/finally or Loan pattern.
Suggested change:
- override def finalize(): Unit ={ - if(writerOpen){ - logWriter.close() - writerOpen = false - } - ... - super.finalize() - } + def close(): Unit = synchronized { + if (writerOpen) { + try logWriter.close() finally writerOpen = false + } + }Then call recorder.close() from the orchestration layer when finishing.
468-481: Set Content-Type when posting Slack JSON; optionally add timeouts.Slack expects application/json; missing header may fail depending on gateway.
Apply:
- val resp = Http(url).postData(new String(baos.toByteArray, Charset.defaultCharset())).asString + val resp = Http(url) + .header("Content-Type", "application/json; charset=UTF-8") + .timeout(connTimeoutMs = 5000, readTimeoutMs = 10000) + .postData(new String(baos.toByteArray, Charset.forName("UTF-8"))) + .asString
126-151: Minor: remove unused val and simplify branch.val msg is unused; just execute side effects.
Apply:
- case _ => - val msg = Option(record.errorMsg) match{ - case Some(m) => printLabeledLine(m, record.severity, record.language) - case None => - if(record.error != null) failedRecord(null, record.page, record.error, record.language) - else recordGenericPage(record.language, record.page.toString) - } + case _ => + Option(record.errorMsg) match { + case Some(m) => printLabeledLine(m, record.severity, record.language) + case None => + if (record.error != null) failedRecord(null, record.page, record.error, record.language) + else recordGenericPage(record.language, record.page.toString) + }core/src/main/scala/org/dbpedia/extraction/util/MediaWikiConnectorAbstract.scala (2)
8-11: Remove unused imports.OutputStreamWriter, URL, ChronoUnit are unused in this abstract class.
Apply:
-import java.io.{InputStream, OutputStreamWriter} -import java.net.{HttpURLConnection, URL} -import java.time.temporal.ChronoUnit +import java.io.InputStream
42-44: Drop unused availableProcessors.These fields aren’t used; dead code.
Apply:
- private val osBean = java.lang.management.ManagementFactory.getOperatingSystemMXBean - private val availableProcessors = osBean.getAvailableProcessors + // removed unused OS bean/utilization fields
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
history/sample-xml-dump/frwiki/20220101/frwiki-20220101-history-dataset.ttl.bz2is excluded by!**/*.bz2history/sample-xml-dump/frwiki/20220101/frwiki-20220101-history-stats.ttl.bz2is excluded by!**/*.bz2history/sample-xml-dump/frwiki/20220101/frwiki-20220101-template-redirects.objis excluded by!**/*.obj
📒 Files selected for processing (57)
core/src/main/java/org/dbpedia/extraction/nif/LinkExtractor.java(8 hunks)core/src/main/resources/datasetdefinitions.json(1 hunks)core/src/main/scala/org/dbpedia/extraction/config/Config.scala(3 hunks)core/src/main/scala/org/dbpedia/extraction/config/provenance/DBpediaDatasets.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/mappings/NifExtractor.scala(3 hunks)core/src/main/scala/org/dbpedia/extraction/mappings/PlainAbstractExtractor.scala(2 hunks)core/src/main/scala/org/dbpedia/extraction/nif/HtmlNifExtractor.scala(4 hunks)core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractorRest.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/util/MediaWikiConnector.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/util/MediaWikiConnectorAbstract.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/util/MediaWikiConnectorRest.scala(1 hunks)core/src/main/scala/org/dbpedia/extraction/util/MediawikiConnectorConfigured.scala(1 hunks)dump/src/test/bash/createMinidump_custom_sample.sh(1 hunks)dump/src/test/bash/createSampleRandomFromPageIDdataset.sh(1 hunks)dump/src/test/bash/create_custom_sample.sh(1 hunks)dump/src/test/resources/extraction-configs/extraction.nif.abstracts.properties(3 hunks)dump/src/test/resources/extraction-configs/extraction.plain.abstracts.properties(3 hunks)dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTestAbstract.md(1 hunks)dump/src/test/scala/org/dbpedia/extraction/dump/ExtractionTestAbstract.scala(1 hunks)history/ReadMe.md(1 hunks)history/download.properties(1 hunks)history/extraction.properties(1 hunks)history/pom.xml(1 hunks)history/src/main/java/org/dbpedia/extraction/sources/WikipediaDumpParserHistory.java(1 hunks)history/src/main/scala/org/dbpedia/extraction/config/Config2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/destinations/WriterDestination2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/dump/extract/ConfigLoader2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/dump/extract/Extraction2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/dump/extract/ExtractionJob2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/dump/extract/SerializableUtils2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/dump/extract/SparkExtractionJob2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/CompositeExtractor2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/CompositeParseExtractor2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/CompositeWikiPageWithRevisionExtractor.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/ExtractionMonitor2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/HistoryPageExtractor.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/HistoryStatsExtractor.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/RedirectExtractor2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/Redirects2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/RevisionNode.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/RevisionNodeExtractor.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/mappings/WikiPageWithRevisionsExtractor.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/sources/Source2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/sources/XMLSource2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/util/DumpExtractionContext2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/util/ExtractionRecorder2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/util/SparkExtractionContext2.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/wikiparser/RevisionNode.scala(1 hunks)history/src/main/scala/org/dbpedia/extraction/wikiparser/WikiPageWithRevisions.scala(1 hunks)history/src/test/resources/extraction-configs/extraction.config.properties(1 hunks)history/src/test/scala/org/dbpedia/extraction/dump/extract/ExtractionTest.scala(1 hunks)mappings/Mapping_el.xml(1 hunks)mappings/Mapping_en.xml(30 hunks)mappings/Mapping_fr.xml(31 hunks)mappings/Mapping_hu.xml(3 hunks)pom.xml(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
history/ReadMe.md
11-11: Bare URL used
(MD034, no-bare-urls)
14-14: Bare URL used
(MD034, no-bare-urls)
38-38: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
39-39: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
40-40: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
41-41: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
44-44: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
45-45: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
46-46: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
47-47: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
48-48: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
49-49: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
72-72: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
89-89: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Shellcheck (0.11.0)
dump/src/test/bash/create_custom_sample.sh
[warning] 8-13: Invalid flags are not handled. Add a *) case.
(SC2220)
[error] 35-35: Double quote array expansions to avoid re-splitting elements.
(SC2068)
[warning] 35-35: In POSIX sh, array references are undefined.
(SC3054)
[warning] 37-37: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 37-37: In POSIX sh, =~ regex matching is undefined.
(SC3015)
[warning] 37-37: Remove quotes from right-hand side of =~ to match as a regex rather than literally.
(SC2076)
[warning] 42-42: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 42-42: In POSIX sh, == in place of = is undefined.
(SC3014)
[warning] 55-55: In POSIX sh, 'declare' is undefined.
(SC3044)
[warning] 57-57: In POSIX sh, $'..' is undefined.
(SC3003)
[warning] 57-57: In POSIX sh, arrays are undefined.
(SC3030)
[warning] 57-57: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 58-58: In POSIX sh, array references are undefined.
(SC3054)
[warning] 59-59: In POSIX sh, array references are undefined.
(SC3054)
[warning] 60-60: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 61-61: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 64-64: In POSIX sh, array references are undefined.
(SC3054)
[warning] 70-70: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 70-70: In POSIX sh, == in place of = is undefined.
(SC3014)
[warning] 71-71: In POSIX sh, indirect expansion is undefined.
(SC3053)
[warning] 71-71: In POSIX sh, array key expansion is undefined.
(SC3055)
[warning] 73-73: In POSIX sh, array references are undefined.
(SC3054)
[warning] 76-76: In POSIX sh, indirect expansion is undefined.
(SC3053)
[warning] 76-76: In POSIX sh, array key expansion is undefined.
(SC3055)
[warning] 78-78: In POSIX sh, array references are undefined.
(SC3054)
dump/src/test/bash/createSampleRandomFromPageIDdataset.sh
[warning] 7-12: Invalid flags are not handled. Add a *) case.
(SC2220)
dump/src/test/bash/createMinidump_custom_sample.sh
[warning] 5-7: Invalid flags are not handled. Add a *) case.
(SC2220)
[warning] 54-54: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
🔇 Additional comments (8)
history/src/test/resources/extraction-configs/extraction.config.properties (1)
21-21: Verify extractor configuration completeness.The AI summary indicates that history extraction uses both
HistoryPageExtractorandHistoryStatsExtractor, but this test configuration only includesHistoryPageExtractor. The mainhistory/extraction.propertiesfile configures both extractors (lines 21-22).Please confirm whether this test intentionally uses a subset of extractors or if
HistoryStatsExtractorshould also be included for comprehensive testing.history/src/main/scala/org/dbpedia/extraction/mappings/RevisionNodeExtractor.scala (1)
1-3: LGTM!The trait definition is clean and follows the established extractor pattern, providing type safety for
RevisionNodeextractors.history/src/main/scala/org/dbpedia/extraction/mappings/CompositeExtractor2.scala (1)
11-15: LGTM - composition pattern correctly implemented.The composite extractor properly aggregates datasets and extraction results from multiple extractors. The use of
flatMapandtoSetensures correct union semantics for datasets, and the extraction concatenation follows the expected composite pattern.history/src/main/scala/org/dbpedia/extraction/mappings/WikiPageWithRevisionsExtractor.scala (1)
1-10: LGTM!The trait provides clear type safety for
WikiPageWithRevisionsextractors. The documentation appropriately explains the purpose and relationship toCompositeExtractor.core/src/main/scala/org/dbpedia/extraction/config/provenance/DBpediaDatasets.scala (1)
281-286: LGTM!The new history dataset declarations follow the established pattern and are properly documented. The mappings to
history_datasetandhistory_statsalign with the dataset definitions introduced in this PR.core/src/main/scala/org/dbpedia/extraction/util/MediaWikiConnector.scala (1)
20-20: [running scripts]#!/bin/bash # Verify if 'new MediaWikiConnector' instantiations were added in this PR's diff git fetch origin main git diff --unified=0 origin/main... | rg '^\+\s*new\s+MediaWikiConnector'core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractor.scala (1)
213-223: Good change: helpers to protected for subclassingVisibility relaxation is appropriate for REST subclass overrides; no behavior change.
history/src/main/scala/org/dbpedia/extraction/dump/extract/Extraction2.scala (1)
29-31: Validate parallelProcesses >= 1If config.parallelProcesses is 0, the scheduler stalls. Enforce min 1 or validate in Config2.
| String linkPrefix = "/wiki/"; | ||
| String linkPrefix2= "./"; | ||
| if(uri.contains(linkPrefix)){ | ||
| uri=uri.substring(uri.indexOf("?title=")+7); | ||
| } else if (uri.contains(linkPrefix2)) { | ||
| uri=uri.substring(uri.indexOf("?title=")+3); | ||
| } | ||
| //TODO central string management | ||
| if(!this.context.language.equals("en")) { | ||
|
|
||
| uri="http://"+this.context.language+".dbpedia.org/resource/"+uri.substring(uri.indexOf("?title=")+7); | ||
| uri="http://"+this.context.language+".dbpedia.org/resource/"+uri; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect substring when stripping REST wiki links
The new cleanLink branch drops the first three characters after ./ and assumes ?title= is absent. For REST HTML such as href="./?title=Main_Page&oldid=123", the current code returns tle=Main_Page&oldid=123, which later becomes http://dbpedia.org/resource/tle=Main_Page&oldid=123. Similarly, /wiki/Foo?oldid=123 keeps the ?oldid tail. Both produce invalid resources. Please normalize the prefixes and strip any ?title=/query/fragment portions before building the DBpedia URI.
- if(uri.contains(linkPrefix)){
- uri=uri.substring(uri.indexOf("?title=")+7);
- } else if (uri.contains(linkPrefix2)) {
- uri=uri.substring(uri.indexOf("?title=")+3);
- }
+ if (uri.startsWith(linkPrefix)) {
+ uri = uri.substring(linkPrefix.length());
+ } else if (uri.startsWith(linkPrefix2)) {
+ uri = uri.substring(linkPrefix2.length());
+ }
+ if (uri.contains("?title=")) {
+ uri = uri.substring(uri.indexOf("?title=") + 7);
+ }
+ int queryIdx = uri.indexOf('?');
+ if (queryIdx >= 0) {
+ uri = uri.substring(0, queryIdx);
+ }
+ int fragmentIdx = uri.indexOf('#');
+ if (fragmentIdx >= 0) {
+ uri = uri.substring(0, fragmentIdx);
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| String linkPrefix = "/wiki/"; | |
| String linkPrefix2= "./"; | |
| if(uri.contains(linkPrefix)){ | |
| uri=uri.substring(uri.indexOf("?title=")+7); | |
| } else if (uri.contains(linkPrefix2)) { | |
| uri=uri.substring(uri.indexOf("?title=")+3); | |
| } | |
| //TODO central string management | |
| if(!this.context.language.equals("en")) { | |
| uri="http://"+this.context.language+".dbpedia.org/resource/"+uri.substring(uri.indexOf("?title=")+7); | |
| uri="http://"+this.context.language+".dbpedia.org/resource/"+uri; | |
| String linkPrefix = "/wiki/"; | |
| String linkPrefix2= "./"; | |
| if (uri.startsWith(linkPrefix)) { | |
| uri = uri.substring(linkPrefix.length()); | |
| } else if (uri.startsWith(linkPrefix2)) { | |
| uri = uri.substring(linkPrefix2.length()); | |
| } | |
| if (uri.contains("?title=")) { | |
| uri = uri.substring(uri.indexOf("?title=") + 7); | |
| } | |
| int queryIdx = uri.indexOf('?'); | |
| if (queryIdx >= 0) { | |
| uri = uri.substring(0, queryIdx); | |
| } | |
| int fragmentIdx = uri.indexOf('#'); | |
| if (fragmentIdx >= 0) { | |
| uri = uri.substring(0, fragmentIdx); | |
| } | |
| //TODO central string management | |
| if (!this.context.language.equals("en")) { | |
| uri = "http://" + this.context.language + ".dbpedia.org/resource/" + uri; | |
| } |
| apiType = this.getProperty("mwc-type", "").trim, | ||
| apiUrl = this.getProperty("mwc-type").trim match { | ||
| case "rest" => this.getProperty("mwc-apiRestUrl", "").trim | ||
| case "mwc" => this.getProperty("mwc-apiMWCUrl", "").trim | ||
| case "local" => this.getProperty("mwc-apiLocalUrl", "").trim | ||
| }, | ||
| maxRetries = this.getProperty("mwc-maxRetries", "4").trim.toInt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle unknown mwc-type safely
The match on this.getProperty("mwc-type").trim has no fallback, so a missing or misspelled property now throws MatchError at startup. Please keep the previous behaviour by providing a sensible default (or an explicit failure with a clear message).
- apiUrl = this.getProperty("mwc-type").trim match {
- case "rest" => this.getProperty("mwc-apiRestUrl", "").trim
- case "mwc" => this.getProperty("mwc-apiMWCUrl", "").trim
- case "local" => this.getProperty("mwc-apiLocalUrl", "").trim
- },
+ apiUrl = this.getProperty("mwc-type", "mwc").trim match {
+ case "rest" => this.getProperty("mwc-apiRestUrl", "").trim
+ case "mwc" => this.getProperty("mwc-apiMWCUrl", "").trim
+ case "local" => this.getProperty("mwc-apiLocalUrl", "").trim
+ case other => throw new IllegalArgumentException(s"Unsupported mwc-type '$other'")
+ },📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| apiType = this.getProperty("mwc-type", "").trim, | |
| apiUrl = this.getProperty("mwc-type").trim match { | |
| case "rest" => this.getProperty("mwc-apiRestUrl", "").trim | |
| case "mwc" => this.getProperty("mwc-apiMWCUrl", "").trim | |
| case "local" => this.getProperty("mwc-apiLocalUrl", "").trim | |
| }, | |
| maxRetries = this.getProperty("mwc-maxRetries", "4").trim.toInt, | |
| apiType = this.getProperty("mwc-type", "").trim, | |
| apiUrl = this.getProperty("mwc-type", "mwc").trim match { | |
| case "rest" => this.getProperty("mwc-apiRestUrl", "").trim | |
| case "mwc" => this.getProperty("mwc-apiMWCUrl", "").trim | |
| case "local" => this.getProperty("mwc-apiLocalUrl", "").trim | |
| case other => throw new IllegalArgumentException(s"Unsupported mwc-type '$other'") | |
| }, | |
| maxRetries = this.getProperty("mwc-maxRetries", "4").trim.toInt, |
🤖 Prompt for AI Agents
In core/src/main/scala/org/dbpedia/extraction/config/Config.scala around lines
264 to 270, the pattern match on this.getProperty("mwc-type").trim can throw
MatchError for unknown or missing values; change it to match the apiType value
you already read and add a fallback case (case _ => "") or an explicit
IllegalArgumentException with a clear message. Concretely, use the apiType val
in the match and add a default branch that returns a sensible default (e.g. an
empty string) or throws a descriptive error so startup fails with a clear
message instead of a MatchError.
| var nodes = doc.select("body").first.childNodes.asScala | ||
|
|
||
| val currentSection = new ListBuffer[Int]() //keeps track of section number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add null-safety around body/section selection
doc.select("body").first and .select("section").first can be null for some pages. Guard to avoid NPEs.
- var nodes = doc.select("body").first.childNodes.asScala
+ val body = doc.select("body").first
+ if (body == null) return tocMap
+ var nodes = body.childNodes.asScala
@@
- val abstractSect=doc.select("body").select("section").first.childNodes.asScala //get first section
+ val firstSection = body.select("section").first
+ if (firstSection == null) return tocMap
+ val abstractSect = firstSection.childNodes.asScalaAlso applies to: 108-111
🤖 Prompt for AI Agents
In
core/src/main/scala/org/dbpedia/extraction/nif/WikipediaNifExtractorRest.scala
around lines 37-39 (and similarly at lines 108-111), calls to
doc.select("body").first and doc.select("section").first can return null and
cause NPEs; add null-safety by checking for null (or wrapping in Option) before
accessing .first or .childNodes, and handle the absent case by skipping
processing or returning early as appropriate; specifically, replace direct
.first usage with a safe lookup (e.g., Option(...).toList or if (elem != null)
...) and only access childNodes or call .asScala when the element is present so
the extractor continues safely when body/section is missing.
| val parameters = "redirect=true" | ||
| val apiUrl: URL = new URL(url.concat(titleParam).concat("?"+parameters)) | ||
|
|
||
|
|
||
|
|
||
| //println(s"mediawikiurl: $apiUrl") | ||
|
|
||
|
|
||
| for (counter <- 1 to maxRetries) { | ||
|
|
||
| val conn = apiUrl.openConnection | ||
| conn.setDoOutput(true) // POST REQUEST to verify | ||
|
|
||
| val start = java.time.LocalTime.now() | ||
|
|
||
| conn.setConnectTimeout(retryFactor * connectMs) | ||
| conn.setReadTimeout(retryFactor * readMs) | ||
| conn.setRequestProperty("accept", apiAccept) | ||
| conn.setRequestProperty("charset", apiCharset) | ||
| conn.setRequestProperty("profile", apiProfile) | ||
| conn.setRequestProperty("Accept-Language", pageTitle.language.wikiCode) | ||
| conn.setRequestProperty("User-Agent", userAgent) | ||
|
|
||
| val inputStream = conn.getInputStream | ||
| val answerHeader = conn.getHeaderFields() | ||
| val answerClean = answerHeader.asScala.filterKeys(_ != null) | ||
|
|
||
| if(conn.getHeaderField(null).contains("HTTP/1.1 200 OK") ){ | ||
|
|
||
|
|
||
| val end = java.time.LocalTime.now() | ||
| conn match { | ||
| case connection: HttpURLConnection => | ||
| log.debug("Request type: " + connection.getRequestMethod + "; URL: " + connection.getURL + | ||
| "; Parameters: " + parameters + "; HTTP code: " + connection.getHeaderField(null) + | ||
| "; Request time: " + start + "; Response time: " + end + "; Time needed: " + | ||
| start.until(end, ChronoUnit.MILLIS)) | ||
| case _ => | ||
| } | ||
| // Read answer | ||
| parsedAnswer = readInAbstract(inputStream) | ||
| SuccessParsing = parsedAnswer match { | ||
| case Success(str) => true | ||
| case Failure(_) => false | ||
| } | ||
| } | ||
| if(!SuccessParsing){ | ||
| var sleepMs = sleepFactorMs | ||
| if (retryAfter && answerClean.contains("retry-after")) { | ||
| //println("GIVEN RETRY-AFTER > "+ answer_clean("retry-after").get(0)) | ||
| waitingTime = Integer.parseInt(answerClean("retry-after").get(0)) * 1000 | ||
|
|
||
| // exponential backoff test | ||
| sleepMs = pow(waitingTime, counter).toInt | ||
| //println("WITH EXPONENTIAL BACK OFF" + counter) | ||
| //println("Sleeping time double >>>>>>>>>>>" + pow(waiting_time, counter)) | ||
| //println("Sleeping time int >>>>>>>>>>>" + sleepMs) | ||
|
|
||
| } | ||
| if (counter < maxRetries) | ||
| Thread.sleep(sleepMs) | ||
| else | ||
| throw new Exception("Timeout error retrieving abstract of " + pageTitle + " in " + counter + " tries.") | ||
| } else { | ||
|
|
||
|
|
||
| //println(s"mediawikiurl: $apiUrl?$parameters") | ||
| return parsedAnswer match { | ||
| case Success(str) => Option(str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the supplied apiParameterString when building the REST URL
retrievePage ignores apiParameterString and always forces redirect=true. Callers rely on that parameter (e.g., to request specific sections or formats); dropping it makes the REST connector fetch the wrong content. Please merge the caller-provided parameters with your default instead of discarding them.
- val parameters = "redirect=true"
- val apiUrl: URL = new URL(url.concat(titleParam).concat("?"+parameters))
+ val baseParams = Option(apiParameterString)
+ .map(_.stripPrefix("?"))
+ .filter(_.nonEmpty)
+ .getOrElse("")
+ val allParams = Seq(baseParams, "redirect=true").filter(_.nonEmpty).mkString("&")
+ val suffix = if (allParams.isEmpty) "" else "?" + allParams
+ val apiUrl: URL = new URL(url.concat(titleParam).concat(suffix))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val parameters = "redirect=true" | |
| val apiUrl: URL = new URL(url.concat(titleParam).concat("?"+parameters)) | |
| //println(s"mediawikiurl: $apiUrl") | |
| for (counter <- 1 to maxRetries) { | |
| val conn = apiUrl.openConnection | |
| conn.setDoOutput(true) // POST REQUEST to verify | |
| val start = java.time.LocalTime.now() | |
| conn.setConnectTimeout(retryFactor * connectMs) | |
| conn.setReadTimeout(retryFactor * readMs) | |
| conn.setRequestProperty("accept", apiAccept) | |
| conn.setRequestProperty("charset", apiCharset) | |
| conn.setRequestProperty("profile", apiProfile) | |
| conn.setRequestProperty("Accept-Language", pageTitle.language.wikiCode) | |
| conn.setRequestProperty("User-Agent", userAgent) | |
| val inputStream = conn.getInputStream | |
| val answerHeader = conn.getHeaderFields() | |
| val answerClean = answerHeader.asScala.filterKeys(_ != null) | |
| if(conn.getHeaderField(null).contains("HTTP/1.1 200 OK") ){ | |
| val end = java.time.LocalTime.now() | |
| conn match { | |
| case connection: HttpURLConnection => | |
| log.debug("Request type: " + connection.getRequestMethod + "; URL: " + connection.getURL + | |
| "; Parameters: " + parameters + "; HTTP code: " + connection.getHeaderField(null) + | |
| "; Request time: " + start + "; Response time: " + end + "; Time needed: " + | |
| start.until(end, ChronoUnit.MILLIS)) | |
| case _ => | |
| } | |
| // Read answer | |
| parsedAnswer = readInAbstract(inputStream) | |
| SuccessParsing = parsedAnswer match { | |
| case Success(str) => true | |
| case Failure(_) => false | |
| } | |
| } | |
| if(!SuccessParsing){ | |
| var sleepMs = sleepFactorMs | |
| if (retryAfter && answerClean.contains("retry-after")) { | |
| //println("GIVEN RETRY-AFTER > "+ answer_clean("retry-after").get(0)) | |
| waitingTime = Integer.parseInt(answerClean("retry-after").get(0)) * 1000 | |
| // exponential backoff test | |
| sleepMs = pow(waitingTime, counter).toInt | |
| //println("WITH EXPONENTIAL BACK OFF" + counter) | |
| //println("Sleeping time double >>>>>>>>>>>" + pow(waiting_time, counter)) | |
| //println("Sleeping time int >>>>>>>>>>>" + sleepMs) | |
| } | |
| if (counter < maxRetries) | |
| Thread.sleep(sleepMs) | |
| else | |
| throw new Exception("Timeout error retrieving abstract of " + pageTitle + " in " + counter + " tries.") | |
| } else { | |
| //println(s"mediawikiurl: $apiUrl?$parameters") | |
| return parsedAnswer match { | |
| case Success(str) => Option(str) | |
| val baseParams = Option(apiParameterString) | |
| .map(_.stripPrefix("?")) | |
| .filter(_.nonEmpty) | |
| .getOrElse("") | |
| val allParams = Seq(baseParams, "redirect=true").filter(_.nonEmpty).mkString("&") | |
| val suffix = if (allParams.isEmpty) "" else "?" + allParams | |
| val apiUrl: URL = new URL(url.concat(titleParam).concat(suffix)) |
| #!/bin/sh | ||
| lang=""; | ||
| n=1000; | ||
| date_archive=$(date -d "$(date +%Y-%m-01) -1 day" +%Y-%m); | ||
| sort="desc" | ||
| while getopts l:d:n:s: flag | ||
| do | ||
| case "${flag}" in | ||
| l) lang=${OPTARG};; | ||
| d) date_archive=${OPTARG};; | ||
| n) n=${OPTARG};; | ||
| s) sort=${OPTARG};; | ||
| esac | ||
| done | ||
| echo "=========================" | ||
| echo "lang: $lang"; | ||
| echo "date: $date_archive"; | ||
| echo "n: $n"; | ||
| echo "sort: $sort"; | ||
| echo "=========================" | ||
|
|
||
|
|
||
| clickstream_data="clickstream_data_${lang}_${date_archive}" | ||
| if [ -f "$clickstream_data" ] | ||
| then | ||
| echo "File found" | ||
| else | ||
| echo "File not found" | ||
| clickstream_url="https://dumps.wikimedia.org/other/clickstream/"; | ||
| content=$(curl -L "$clickstream_url$date_archive/") | ||
| links=$( echo $content | grep -Po '(?<=href=")[^"]*'); | ||
| toextract=""; | ||
| substr="-${lang}wiki-" | ||
| echo $substr | ||
| for link in ${links[@]}; do | ||
| echo $link | ||
| if [[ $link =~ "-${lang}wiki-" ]];then | ||
| toextract="$clickstream_url$date_archive/$link"; | ||
| fi | ||
| done | ||
|
|
||
| if [[ $toextract == "" ]]; then | ||
| echo "Lang not found in clickstream"; | ||
| exit 1; | ||
| fi | ||
|
|
||
| echo ">>>> DOWNLOAD $toextract and save it" | ||
|
|
||
| wget -O "${clickstream_data}.gz" $toextract; | ||
| gzip -d "${clickstream_data}.gz" | ||
| fi | ||
|
|
||
|
|
||
| echo ">>>> COMPUTE SUM OF CLICKS" | ||
| declare -A dict | ||
| while IFS= read -r line; do | ||
| IFS=$'\t'; arrIN=($line); unset IFS; | ||
| key=${arrIN[1]} | ||
| val=${arrIN[3]} | ||
| if [[ ${key} != *"List"* ]];then | ||
| if [[ ${#dict[${key}]} -eq 0 ]] ;then | ||
| dict[${key}]=$(($val)); | ||
| else | ||
| dict[${key}]=$((${dict[${key}]}+$val)); | ||
| fi | ||
| fi | ||
| done < $clickstream_data | ||
|
|
||
| echo ">>>> SORT IT AND SAVE TEMP" | ||
| if [[ $sort == "desc" ]]; then | ||
| for page in "${!dict[@]}" | ||
| do | ||
| echo "$page ${dict[$page]}" | ||
| done | sort -rn -k2 | head -n "$n" | cut -d ' ' -f 1 >> temp.txt; | ||
| else | ||
| for page in "${!dict[@]}" | ||
| do | ||
| echo "$page ${dict[$page]}" | ||
| done | sort -n -k2 | head -n "$n" | cut -d ' ' -f 1 >> temp.txt; | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix shebang or drop Bash-specific syntax.
The script declares #!/bin/sh yet relies on Bash-only features (declare -A, ${array[@]}, [[ … ]], regex =~). On systems where /bin/sh is not Bash, the script aborts. Either switch the shebang to #!/bin/bash (and ensure Bash is available) or refactor to POSIX-compliant constructs.
-#!/bin/sh
+#!/bin/bash📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #!/bin/sh | |
| lang=""; | |
| n=1000; | |
| date_archive=$(date -d "$(date +%Y-%m-01) -1 day" +%Y-%m); | |
| sort="desc" | |
| while getopts l:d:n:s: flag | |
| do | |
| case "${flag}" in | |
| l) lang=${OPTARG};; | |
| d) date_archive=${OPTARG};; | |
| n) n=${OPTARG};; | |
| s) sort=${OPTARG};; | |
| esac | |
| done | |
| echo "=========================" | |
| echo "lang: $lang"; | |
| echo "date: $date_archive"; | |
| echo "n: $n"; | |
| echo "sort: $sort"; | |
| echo "=========================" | |
| clickstream_data="clickstream_data_${lang}_${date_archive}" | |
| if [ -f "$clickstream_data" ] | |
| then | |
| echo "File found" | |
| else | |
| echo "File not found" | |
| clickstream_url="https://dumps.wikimedia.org/other/clickstream/"; | |
| content=$(curl -L "$clickstream_url$date_archive/") | |
| links=$( echo $content | grep -Po '(?<=href=")[^"]*'); | |
| toextract=""; | |
| substr="-${lang}wiki-" | |
| echo $substr | |
| for link in ${links[@]}; do | |
| echo $link | |
| if [[ $link =~ "-${lang}wiki-" ]];then | |
| toextract="$clickstream_url$date_archive/$link"; | |
| fi | |
| done | |
| if [[ $toextract == "" ]]; then | |
| echo "Lang not found in clickstream"; | |
| exit 1; | |
| fi | |
| echo ">>>> DOWNLOAD $toextract and save it" | |
| wget -O "${clickstream_data}.gz" $toextract; | |
| gzip -d "${clickstream_data}.gz" | |
| fi | |
| echo ">>>> COMPUTE SUM OF CLICKS" | |
| declare -A dict | |
| while IFS= read -r line; do | |
| IFS=$'\t'; arrIN=($line); unset IFS; | |
| key=${arrIN[1]} | |
| val=${arrIN[3]} | |
| if [[ ${key} != *"List"* ]];then | |
| if [[ ${#dict[${key}]} -eq 0 ]] ;then | |
| dict[${key}]=$(($val)); | |
| else | |
| dict[${key}]=$((${dict[${key}]}+$val)); | |
| fi | |
| fi | |
| done < $clickstream_data | |
| echo ">>>> SORT IT AND SAVE TEMP" | |
| if [[ $sort == "desc" ]]; then | |
| for page in "${!dict[@]}" | |
| do | |
| echo "$page ${dict[$page]}" | |
| done | sort -rn -k2 | head -n "$n" | cut -d ' ' -f 1 >> temp.txt; | |
| else | |
| for page in "${!dict[@]}" | |
| do | |
| echo "$page ${dict[$page]}" | |
| done | sort -n -k2 | head -n "$n" | cut -d ' ' -f 1 >> temp.txt; | |
| fi | |
| #!/bin/bash | |
| lang=""; | |
| n=1000; | |
| date_archive=$(date -d "$(date +%Y-%m-01) -1 day" +%Y-%m); | |
| sort="desc" | |
| while getopts l:d:n:s: flag | |
| do | |
| case "${flag}" in | |
| l) lang=${OPTARG};; | |
| d) date_archive=${OPTARG};; | |
| n) n=${OPTARG};; | |
| s) sort=${OPTARG};; | |
| esac | |
| done |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 8-13: Invalid flags are not handled. Add a *) case.
(SC2220)
[error] 35-35: Double quote array expansions to avoid re-splitting elements.
(SC2068)
[warning] 35-35: In POSIX sh, array references are undefined.
(SC3054)
[warning] 37-37: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 37-37: In POSIX sh, =~ regex matching is undefined.
(SC3015)
[warning] 37-37: Remove quotes from right-hand side of =~ to match as a regex rather than literally.
(SC2076)
[warning] 42-42: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 42-42: In POSIX sh, == in place of = is undefined.
(SC3014)
[warning] 55-55: In POSIX sh, 'declare' is undefined.
(SC3044)
[warning] 57-57: In POSIX sh, $'..' is undefined.
(SC3003)
[warning] 57-57: In POSIX sh, arrays are undefined.
(SC3030)
[warning] 57-57: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
(SC2206)
[warning] 58-58: In POSIX sh, array references are undefined.
(SC3054)
[warning] 59-59: In POSIX sh, array references are undefined.
(SC3054)
[warning] 60-60: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 61-61: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 64-64: In POSIX sh, array references are undefined.
(SC3054)
[warning] 70-70: In POSIX sh, [[ ]] is undefined.
(SC3010)
[warning] 70-70: In POSIX sh, == in place of = is undefined.
(SC3014)
[warning] 71-71: In POSIX sh, indirect expansion is undefined.
(SC3053)
[warning] 71-71: In POSIX sh, array key expansion is undefined.
(SC3055)
[warning] 73-73: In POSIX sh, array references are undefined.
(SC3054)
[warning] 76-76: In POSIX sh, indirect expansion is undefined.
(SC3053)
[warning] 76-76: In POSIX sh, array key expansion is undefined.
(SC3055)
[warning] 78-78: In POSIX sh, array references are undefined.
(SC3054)
🤖 Prompt for AI Agents
In dump/src/test/bash/create_custom_sample.sh lines 1-80, the script uses
Bash-only features (declare -A, [[ ]], =~, ${array[@]}) but declares #!/bin/sh;
change the shebang to a Bash-aware interpreter (e.g., #!/usr/bin/env bash) to
guarantee Bash semantics across systems, or alternatively refactor all
Bash-specific constructs to POSIX-compliant equivalents; update the shebang to
the env-backed bash if you choose the simpler fix and verify the script runs on
target CI/hosts.
| @@ -0,0 +1,27 @@ | |||
| # download and extraction target dir | |||
| #base-dir= moved to $extraction-framework/core/src/main/resources/universal.properties | |||
| base-dir=/home/cringwal/Desktop/DBpediaHistory/extraction-framework/history/sample-xml-dump | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove hardcoded absolute path with username.
The hardcoded path /home/cringwal/Desktop/DBpediaHistory/... contains a specific username and will fail on other machines or CI environments.
Consider one of these approaches:
- Use a relative path:
-base-dir=/home/cringwal/Desktop/DBpediaHistory/extraction-framework/history/sample-xml-dump
+base-dir=./sample-xml-dump- Or rely on the universal.properties default and remove this line entirely if the default is suitable for tests.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| base-dir=/home/cringwal/Desktop/DBpediaHistory/extraction-framework/history/sample-xml-dump | |
| base-dir=./sample-xml-dump |
🤖 Prompt for AI Agents
In history/src/test/resources/extraction-configs/extraction.config.properties
around line 3, the property base-dir contains a hardcoded absolute path with a
username; remove this platform-specific path and either delete the line so the
test falls back to the universal.properties default, or replace it with a
portable relative path (e.g. ./sample-xml-dump or ../sample-xml-dump) that
points to the test resource directory so the tests run on any machine/CI.
| test("test Historic extraction", HistoTestTag) { | ||
| val jobsRunning = new ConcurrentLinkedQueue[Future[Unit]]()// | ||
| val classLoader: ClassLoader = getClass.getClassLoader | ||
| val histoConfig = new Config2(classLoader.getResource("extraction-configs/extraction.config.properties").getFile) | ||
| println(classLoader.getResource("extraction-configs/extraction.config.properties").getFile.toString) | ||
| println("BEFORE EXTRACT") | ||
| extract(histoConfig, jobsRunning) | ||
| println("AFTER EXTRACT") | ||
| } | ||
|
|
||
| def extract(config: Config2, jobsRunning: ConcurrentLinkedQueue[Future[Unit]]): Unit = { | ||
| val configLoader = new ConfigLoader2(config) | ||
| val jobs = configLoader.getExtractionJobs | ||
| println(">>>>>>>>> EXTRACT - NBJOBS > " + jobs.size) | ||
| println("LAUNCH JOBS") | ||
| for (job <- jobs) { | ||
| job.run() | ||
| } | ||
| while (jobsRunning.size() > 0) { | ||
|
|
||
| Thread.sleep(1000) | ||
| } | ||
|
|
||
| jobsRunning.clear() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource loading via getFile breaks off-classpath / espaços.
classLoader.getResource(...).getFile only works for plain file URLs without spaces. In Maven/Gradle runs the resource often sits in target/test-classes, so toURI is safer; inside a JAR it will explode. Please resolve the URL properly (Paths.get(resource.toURI)) or stream it.
Fix sketch:
- val histoConfig = new Config2(classLoader.getResource("extraction-configs/extraction.config.properties").getFile)
+ val url = classLoader.getResource("extraction-configs/extraction.config.properties")
+ require(url != null, "Missing extraction-config resource")
+ val histoConfig = new Config2(Paths.get(url.toURI).toString)Add the requisite java.nio.file.Paths import.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| test("test Historic extraction", HistoTestTag) { | |
| val jobsRunning = new ConcurrentLinkedQueue[Future[Unit]]()// | |
| val classLoader: ClassLoader = getClass.getClassLoader | |
| val histoConfig = new Config2(classLoader.getResource("extraction-configs/extraction.config.properties").getFile) | |
| println(classLoader.getResource("extraction-configs/extraction.config.properties").getFile.toString) | |
| println("BEFORE EXTRACT") | |
| extract(histoConfig, jobsRunning) | |
| println("AFTER EXTRACT") | |
| } | |
| def extract(config: Config2, jobsRunning: ConcurrentLinkedQueue[Future[Unit]]): Unit = { | |
| val configLoader = new ConfigLoader2(config) | |
| val jobs = configLoader.getExtractionJobs | |
| println(">>>>>>>>> EXTRACT - NBJOBS > " + jobs.size) | |
| println("LAUNCH JOBS") | |
| for (job <- jobs) { | |
| job.run() | |
| } | |
| while (jobsRunning.size() > 0) { | |
| Thread.sleep(1000) | |
| } | |
| jobsRunning.clear() | |
| } | |
| test("test Historic extraction", HistoTestTag) { | |
| val jobsRunning = new ConcurrentLinkedQueue[Future[Unit]]()// | |
| val classLoader: ClassLoader = getClass.getClassLoader | |
| val url = classLoader.getResource("extraction-configs/extraction.config.properties") | |
| require(url != null, "Missing extraction-config resource") | |
| val histoConfig = new Config2(Paths.get(url.toURI).toString) | |
| println(classLoader.getResource("extraction-configs/extraction.config.properties").getFile.toString) | |
| println("BEFORE EXTRACT") | |
| extract(histoConfig, jobsRunning) | |
| println("AFTER EXTRACT") | |
| } | |
| def extract(config: Config2, jobsRunning: ConcurrentLinkedQueue[Future[Unit]]): Unit = { | |
| val configLoader = new ConfigLoader2(config) | |
| val jobs = configLoader.getExtractionJobs | |
| println(">>>>>>>>> EXTRACT - NBJOBS > " + jobs.size) | |
| println("LAUNCH JOBS") | |
| for (job <- jobs) { | |
| job.run() | |
| } | |
| while (jobsRunning.size() > 0) { | |
| Thread.sleep(1000) | |
| } | |
| jobsRunning.clear() | |
| } |
🤖 Prompt for AI Agents
In
history/src/test/scala/org/dbpedia/extraction/dump/extract/ExtractionTest.scala
around lines 17 to 41, the test uses classLoader.getResource(...).getFile which
fails for off-classpath resources, spaces, and JAR-packed resources; replace
that usage by resolving the URL to a URI and converting via
Paths.get(resource.toURI()) or, better, load the resource as a stream with
getResourceAsStream for JAR safety, add a null-check for the resource, and add
the requisite java.nio.file.Paths import (or use java.io.InputStream handling)
so the test works in Maven/Gradle and inside JARs.
| {{PropertyMapping | templateProperty = όνομα | ontologyProperty = foaf:name }} | ||
| {{PropertyMapping | templateProperty = Χώρα | ontologyProperty = country }} | ||
| {{PropertyMapping | templateProperty = Πληθυσμός | ontologyProperty = population }} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use populationTotal for Πληθυσμός.
ontologyProperty = population is not defined in the DBpedia ontology, so the extractor will drop these values instead of populating the village population. Swap to populationTotal, which is already used elsewhere for settlements and compiles cleanly.
- {{PropertyMapping | templateProperty = Πληθυσμός | ontologyProperty = population }}
+ {{PropertyMapping | templateProperty = Πληθυσμός | ontologyProperty = populationTotal }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{PropertyMapping | templateProperty = όνομα | ontologyProperty = foaf:name }} | |
| {{PropertyMapping | templateProperty = Χώρα | ontologyProperty = country }} | |
| {{PropertyMapping | templateProperty = Πληθυσμός | ontologyProperty = population }} | |
| {{PropertyMapping | templateProperty = όνομα | ontologyProperty = foaf:name }} | |
| {{PropertyMapping | templateProperty = Χώρα | ontologyProperty = country }} | |
| {{PropertyMapping | templateProperty = Πληθυσμός | ontologyProperty = populationTotal }} |
🤖 Prompt for AI Agents
In mappings/Mapping_el.xml around lines 3396 to 3399, the mapping uses
ontologyProperty = population which is not defined in DBpedia ontology; change
it to ontologyProperty = populationTotal so the extractor will retain and
populate village population values consistently with other settlement mappings.
| {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }} | ||
| {{ PropertyMapping | templateProperty = date | ontologyProperty = date }} | ||
| {{ PropertyMapping | templateProperty = disease | ontologyProperty = disease }} | ||
| {{ PropertyMapping | templateProperty = virus_strain | ontologyProperty = virus_strain }} | ||
| {{ PropertyMapping | templateProperty = location | ontologyProperty = Location }} | ||
| {{ PropertyMapping | templateProperty = deaths | ontologyProperty = Deaths }} | ||
| {{ PropertyMapping | templateProperty = symptom | ontologyProperty = Symptom }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix invalid ontology property names in Infobox pandemic mapping.
Location, Deaths, Symptom, and virus_strain are not valid DBpedia ontology properties; the extractor will either mint non-existent predicates or drop the data. Please map to the canonical property names (e.g., location, numberOfDeaths, symptom, virusStrain).
- {{ PropertyMapping | templateProperty = location | ontologyProperty = Location }}
- {{ PropertyMapping | templateProperty = deaths | ontologyProperty = Deaths }}
- {{ PropertyMapping | templateProperty = symptom | ontologyProperty = Symptom }}
- {{ PropertyMapping | templateProperty = virus_strain | ontologyProperty = virus_strain }}
+ {{ PropertyMapping | templateProperty = location | ontologyProperty = location }}
+ {{ PropertyMapping | templateProperty = deaths | ontologyProperty = numberOfDeaths }}
+ {{ PropertyMapping | templateProperty = symptom | ontologyProperty = symptom }}
+ {{ PropertyMapping | templateProperty = virus_strain | ontologyProperty = virusStrain }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }} | |
| {{ PropertyMapping | templateProperty = date | ontologyProperty = date }} | |
| {{ PropertyMapping | templateProperty = disease | ontologyProperty = disease }} | |
| {{ PropertyMapping | templateProperty = virus_strain | ontologyProperty = virus_strain }} | |
| {{ PropertyMapping | templateProperty = location | ontologyProperty = Location }} | |
| {{ PropertyMapping | templateProperty = deaths | ontologyProperty = Deaths }} | |
| {{ PropertyMapping | templateProperty = symptom | ontologyProperty = Symptom }} | |
| {{ PropertyMapping | templateProperty = name | ontologyProperty = foaf:name }} | |
| {{ PropertyMapping | templateProperty = date | ontologyProperty = date }} | |
| {{ PropertyMapping | templateProperty = disease | ontologyProperty = disease }} | |
| {{ PropertyMapping | templateProperty = virus_strain | ontologyProperty = virusStrain }} | |
| {{ PropertyMapping | templateProperty = location | ontologyProperty = location }} | |
| {{ PropertyMapping | templateProperty = deaths | ontologyProperty = numberOfDeaths }} | |
| {{ PropertyMapping | templateProperty = symptom | ontologyProperty = symptom }} |
🤖 Prompt for AI Agents
In mappings/Mapping_en.xml around lines 21493 to 21499, several ontologyProperty
values in the Infobox pandemic PropertyMapping entries are invalid; update the
mappings so DBpedia uses canonical property names: change
ontologyProperty="Location" to ontologyProperty="location", "Deaths" to
"numberOfDeaths", "Symptom" to "symptom", and "virus_strain" to the canonical
"virusStrain" (also keep foaf:name and date/disease mappings as-is); ensure
exact spelling and casing to prevent minting non-existent predicates or data
loss.
| {{PropertyMapping | templateProperty = numéro piste préc | ontologyProperty = nextTrackNumber}} | ||
| {{PropertyMapping | templateProperty = numéro piste suiv | ontologyProperty = previousTrackNumber}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap previous/next track property mappings
numéro piste préc (previous track) is mapped to nextTrackNumber, and numéro piste suiv (next track) is mapped to previousTrackNumber, inverting the semantics returned by the extractor.
- {{PropertyMapping | templateProperty = numéro piste préc | ontologyProperty = nextTrackNumber}}
- {{PropertyMapping | templateProperty = numéro piste suiv | ontologyProperty = previousTrackNumber}}
+ {{PropertyMapping | templateProperty = numéro piste préc | ontologyProperty = previousTrackNumber}}
+ {{PropertyMapping | templateProperty = numéro piste suiv | ontologyProperty = nextTrackNumber}}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{PropertyMapping | templateProperty = numéro piste préc | ontologyProperty = nextTrackNumber}} | |
| {{PropertyMapping | templateProperty = numéro piste suiv | ontologyProperty = previousTrackNumber}} | |
| {{PropertyMapping | templateProperty = numéro piste préc | ontologyProperty = previousTrackNumber}} | |
| {{PropertyMapping | templateProperty = numéro piste suiv | ontologyProperty = nextTrackNumber}} |
🤖 Prompt for AI Agents
In mappings/Mapping_fr.xml around lines 5374-5375, the two PropertyMapping
entries have their ontologyProperty values inverted: "numéro piste préc"
(previous track) is incorrectly mapped to nextTrackNumber and "numéro piste
suiv" (next track) is incorrectly mapped to previousTrackNumber; swap the
ontologyProperty values so "numéro piste préc" maps to previousTrackNumber and
"numéro piste suiv" maps to nextTrackNumber to restore correct semantics.








First prototype of DBpedia History
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests