chore: enable Wikidata extraction in server GUI #792

haniyakonain · 2025-10-06T20:48:58Z

Enable Wikidata by adding 'wikidata' to the languages configuration.

Setup:

Add 'languages=de,en,wikidata' to server.default.properties
Run the server
Navigate to http://localhost:9999/server/extraction/wikidata/

Example page titles to extract:

Q42 (Douglas Adams)
Q5 (human)
Q2 (Earth)

Summary by CodeRabbit

New Features
- Added Wikidata extraction support with multiple modes (standard, mappings, custom) and additional extraction types.
Improvements
- Extraction subsystem initialized earlier for faster availability.
- Added basic Wikidata statistics for server reporting.
Tests
- New stats & redirects test script and CI step to validate statistics and redirects endpoints.
Limitations
- Some Wikidata extractor types are disabled for live extraction.
- Template mapping statistics return 501 for Wikidata.

coderabbitai · 2025-10-06T20:49:33Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Replaces explicit languages with @mappings; adds multiple Wikidata extractor modes and a Wikidata statistics file; exposes a public DynamicExtractionManager on Server startup and calls updateAll; TemplateStatistics endpoints return HTTP 501 for Wikidata; adds CI step and a stats/redirects test script.

Changes

Cohort / File(s)	Summary
Configuration `server/server.default.properties`	Replaced explicit language list with `@mappings`; added `extractors.wikidata` blocks (`default`, `mappings`, `custom`) listing many Wikidata extractors; commented out `WikidataLexemeExtractor` and `WikidataPropertyExtractor` with a note about dump-only namespaces.
Statistics data `server/src/main/statistics/mappingstats_wikidata.txt`	New static statistics file containing key-value mappings for Wikidata (`wikiStats
Server initialization `server/src/main/scala/.../Server.scala`	Added public `val extractor: ExtractionManager` initialized as a `DynamicExtractionManager(...)` and calls `extractor.updateAll()` during Server construction.
API guard `server/src/main/scala/.../resources/TemplateStatistics.scala`	Added guard: when `language.wikiCode == "wikidata"`, endpoints return HTTP 501 with message `Template mapping statistics are not available for Wikidata`.
CI workflow `.github/workflows/server-web-api-test.yml`	Removed trailing space from `JAVA_TOOL_OPTIONS`, set `MAVEN_OPTS="-Daa.languages=@mappings"` when starting the server, and added a step to run stats/redirects tests.
Tests / Scripts `scripts/src/main/bash/stats-redirects-test.sh`	New Bash script that discovers languages from `/mappings/` and checks `/statistics/{lang}/` and `/mappings/{lang}/redirects/` endpoints for HTTP 200; summarizes results and exits nonzero on failures.

Sequence Diagram(s)

sequenceDiagram
    participant Main as Server.main
    participant Server as Server(instance)
    participant Extractor as DynamicExtractionManager

    Main->>Server: construct Server()
    Note right of Server: exposes public `val extractor`
    Server->>Extractor: instantiate DynamicExtractionManager(managers(...), languages, paths, ...)
    Server->>Extractor: extractor.updateAll()
    Note left of Extractor: triggers extraction update/refresh

sequenceDiagram
    participant Client as API client
    participant API as TemplateStatistics endpoint
    participant Server as Server(instance)

    Client->>API: GET /statistics/{lang}/
    alt lang == "wikidata"
        API-->>Client: HTTP 501 "Template mapping statistics are not available for Wikidata"
    else
        API->>Server: validate language & lookup manager
        Server-->>API: manager / stats
        API-->>Client: HTTP 200 + stats
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "chore: enable Wikidata extraction in server GUI" directly and accurately reflects the main objective of the changeset. The changes across multiple files—including configuration updates to add Wikidata to the languages list, new Wikidata extractors configuration, server class modifications, and supporting test infrastructure—all work together to accomplish this single primary goal. The title is concise, specific, and clear enough that a teammate scanning the repository history would immediately understand that this PR adds Wikidata support to the server GUI without requiring vague or misleading language.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e91b21 and 0831779.

📒 Files selected for processing (2)

server/server.default.properties (1 hunks)
server/src/main/statistics/mappingstats_wikidata.txt (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build
GitHub Check: build
GitHub Check: build

server/server.default.properties

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

server/src/main/scala/org/dbpedia/extraction/server/Server.scala (1)
205-209: Consider adding explicit error handling for consistency.

The initialization calls updateAll without a try-catch block, unlike the cache loader pattern at lines 62-73 which includes explicit error handling and logging. While the fail-fast behavior (server won't start if initialization fails) is reasonable, adding explicit error handling would provide clearer error messages and be consistent with the existing error handling pattern in the codebase.

Example:
-      // Initialize all extractors and statistics
-      logger.info("Initializing extractors and statistics...")
-      _instance.extractor.updateAll
-      logger.info("Extractors and statistics initialized")
+      // Initialize all extractors and statistics
+      try {
+        logger.info("Initializing extractors and statistics...")
+        _instance.extractor.updateAll
+        logger.info("Extractors and statistics initialized")
+      } catch {
+        case e: Exception =>
+          logger.severe(s"Failed to initialize extractors and statistics: ${e.getMessage}")
+          e.printStackTrace()
+          throw e
+      }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0831779 and 7c2ba28.

📒 Files selected for processing (3)

server/server.default.properties (2 hunks)
server/src/main/scala/org/dbpedia/extraction/server/Server.scala (1 hunks)
server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

server/src/main/scala/org/dbpedia/extraction/server/Server.scala (1)

server/src/main/scala/org/dbpedia/extraction/server/ExtractionManager.scala (1)

updateAll (64-66)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build
GitHub Check: build
GitHub Check: build

🔇 Additional comments (3)

server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala (1)

90-94: LGTM!

The guard correctly blocks Wikidata from template statistics with an appropriate 501 (Not Implemented) status. The check is properly placed before the managers validation to provide a more specific error message. This aligns with the fact that Wikidata uses a different structure (Q-identifiers) rather than templates.

server/server.default.properties (2)

21-21: Previous concern resolved.

The critical issue from the previous review has been addressed. Adding wikidata to the languages list ensures that the Wikidata extractors defined at lines 110-115 will be properly activated and the extraction endpoint will be available at http://localhost:9999/server/extraction/wikidata/ as intended.

110-115: LGTM!

The Wikidata extractors configuration is comprehensive and includes appropriate extractors for Wikidata entities (labels, descriptions, aliases, references, etc.). The comment explaining why WikidataLexemeExtractor and WikidataPropertyExtractor are excluded provides valuable context about namespace handling limitations in live server extraction.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d80f64e and 8b343aa.

📒 Files selected for processing (4)

server/server.default.properties (2 hunks)
server/src/main/scala/org/dbpedia/extraction/server/Server.scala (1 hunks)
server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala (1 hunks)
server/src/main/statistics/mappingstats_wikidata.txt (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

server/src/main/statistics/mappingstats_wikidata.txt

🧰 Additional context used

🧬 Code graph analysis (1)

server/src/main/scala/org/dbpedia/extraction/server/Server.scala (1)

server/src/main/scala/org/dbpedia/extraction/server/ExtractionManager.scala (1)

updateAll (64-66)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build
GitHub Check: build
GitHub Check: build

🔇 Additional comments (1)

server/server.default.properties (1)

21-21: LGTM! Wikidata configuration properly implemented.

The changes correctly address the PR objective and the previous review feedback:

✅ Line 21: Added wikidata to the languages list, resolving the critical issue raised in the previous review

✅ Lines 110-124: Comprehensive Wikidata extractor configuration with:

Default extractors for general Wikidata extraction

Mappings mode for label and same-as extraction

Custom mode for advanced extraction options

Clear documentation about excluded extractors (line 115) that require special namespace handling

The configuration follows established patterns used for other languages and provides flexibility with multiple extraction modes.

Also applies to: 110-124

coderabbitai · 2025-10-16T19:35:00Z

server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala

+    // Wikidata doesn't support template mapping statistics
+if (language.wikiCode == "wikidata") {
+  throw new WebApplicationException(new Exception("Template mapping statistics are not available for Wikidata"), 501)
+}


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify Wikidata guard coverage across all endpoints.

The guard correctly prevents template statistics access for Wikidata in the HTML view. However, verify that other code paths properly handle Wikidata:

JSON endpoint (line 288): The getAsJson method accesses Server.instance.managers(language).holder without a Wikidata guard. If a client requests /statistics/wikidata/json/, it will attempt to retrieve template statistics from the Wikidata manager.

All-languages aggregation (line 46): The allLanguages method iterates through all managers including Wikidata and aggregates their template statistics. If Wikidata's manager contains no meaningful template statistics, this could produce misleading totals.

Run the following script to check if the JSON endpoint and all-languages view need similar guards:

🏁 Script executed:

#!/bin/bash # Description: Check Wikidata handling in other statistics endpoints # Check if getAsJson method has Wikidata guard echo "=== Checking getAsJson method for Wikidata guard ===" rg -n -A 10 -B 2 'def getAsJson' server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala # Check if allLanguages method has Wikidata filtering echo "=== Checking allLanguages method for Wikidata filtering ===" rg -n -A 5 -B 2 'for.*managers' server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala

Length of output: 1137

🏁 Script executed:

#!/bin/bash # Get full context of getAsJson method including else block and subsequent lines echo "=== Full getAsJson method ===" rg -n -A 30 'def getAsJson' server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala | head -50 # Check Language.getOrElse to see if it filters Wikidata echo -e "\n=== Search for Language handling and Wikidata checks ===" rg -n -B 2 -A 2 'Language\.getOrElse|wikidata|wikiCode' server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala

Length of output: 1907

Add Wikidata guard to getAsJson method and verify allLanguages method.

The Wikidata guard in singleLanguage (lines 90-93) does not protect the JSON endpoint. The getAsJson method lacks this guard and will process Wikidata in both code paths:

Wildcard request (langCode="*"): Line 278 iterates all managers including Wikidata

Specific language (langCode="wikidata"): Line 282 retrieves the language without checking if it's Wikidata, then line 288 accesses its statistics

Additionally, the allLanguages method (line 46) iterates all managers without filtering Wikidata. Apply the same Wikidata guard logic (lines 90-93) to both getAsJson (lines 277-283) and allLanguages (line 46) to ensure consistent behavior across all endpoints.

🤖 Prompt for AI Agents

In server/src/main/scala/org/dbpedia/extraction/server/resources/TemplateStatistics.scala around lines 46 and 277-283, add the same Wikidata guard used in singleLanguage (lines 90-93) so Wikidata is excluded from JSON and all-languages endpoints: in allLanguages (line 46) filter out managers/languages whose wikiCode == "wikidata" before iterating, and in getAsJson (lines 277-283) check language.wikiCode == "wikidata" and throw the same WebApplicationException(…, 501) when encountered (apply this both for wildcard iteration—skip Wikidata managers—and for specific langCode="wikidata"—reject with the exception) to ensure consistent handling.

coderabbitai · 2025-10-16T19:35:01Z

server/src/main/scala/org/dbpedia/extraction/server/Server.scala

+      // Initialize all extractors and statistics
+      logger.info("Initializing extractors and statistics...")
+      _instance.extractor.updateAll
+      logger.info("Extractors and statistics initialized")


⚠️ Potential issue | 🟠 Major

Add error handling and consider timeout for blocking initialization.

The synchronous updateAll call lacks error handling, unlike the similar initialization pattern in singleExtractorCache (lines 62-72) that wraps initialization in try-catch. If updateAll throws an exception, the server fails to start without clear logging of the root cause.

Additionally, this blocking call delays HTTP server startup (line 222). If extractor initialization takes a long time (e.g., loading large mapping files, downloading statistics), the server becomes unresponsive during startup.

Consider these improvements:

1. Add error handling:

// Initialize all extractors and statistics logger.info("Initializing extractors and statistics...") -_instance.extractor.updateAll -logger.info("Extractors and statistics initialized") +try { + _instance.extractor.updateAll + logger.info("Extractors and statistics initialized") +} catch { + case e: Exception => + logger.severe(s"Failed to initialize extractors and statistics: ${e.getMessage}") + e.printStackTrace() + throw e +}

2. Consider timeout or async initialization:

If initialization can take significant time, consider adding a timeout mechanism or making it asynchronous with proper status tracking, so the HTTP server can start responding to health checks while initialization completes in the background.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Initialize all extractors and statistics

logger.info("Initializing extractors and statistics...")

_instance.extractor.updateAll

logger.info("Extractors and statistics initialized")

// Initialize all extractors and statistics

logger.info("Initializing extractors and statistics...")

try {

_instance.extractor.updateAll

logger.info("Extractors and statistics initialized")

} catch {

case e: Exception =>

logger.severe(s"Failed to initialize extractors and statistics: ${e.getMessage}")

e.printStackTrace()

throw e

}

🤖 Prompt for AI Agents

In server/src/main/scala/org/dbpedia/extraction/server/Server.scala around lines 205 to 208, the synchronous call `_instance.extractor.updateAll` is unprotected and can throw, and it also blocks HTTP startup; wrap the call in a try/catch to log the full exception and fail-fast with a clear message, or better, make initialization asynchronous: start the HTTP server immediately and run `updateAll` in a background Future with configurable timeout and proper recovery that logs errors and updates an initialization-ready flag or health endpoint; if you prefer still-blocking behavior, use a configurable timeout (Await.result or equivalent) and catch TimeoutException to log and handle it gracefully rather than letting the server die silently.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

scripts/src/main/bash/stats-redirects-test.sh (1)
22-36: Consider explicit error handling for curl failures.

If curl encounters network errors or timeouts, the http_code variable may be empty or contain unexpected values. While the subsequent check will fail safely, adding explicit error detection improves robustness and debugging.
 test_endpoint() {
     local lang="$1"
     local type="$2"
     local url="$3"
 
-    http_code=$(curl -s -o /dev/null -w "%{http_code}" "$url")
+    http_code=$(curl -s -o /dev/null -w "%{http_code}" "$url" 2>/dev/null) || http_code="000"
 
     if [ "$http_code" = "200" ]; then
         echo "✅ $lang $type: HTTP $http_code"
     else
         echo "❌ $lang $type: HTTP $http_code"
         FAILED=$((FAILED + 1))
         FAILED_TESTS+=("$lang $type (HTTP $http_code)")
     fi
 }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ad3157 and 066afc3.

📒 Files selected for processing (2)

.github/workflows/server-web-api-test.yml (3 hunks)
scripts/src/main/bash/stats-redirects-test.sh (1 hunks)

🧰 Additional context used

🪛 Shellcheck (0.11.0)

scripts/src/main/bash/stats-redirects-test.sh

[warning] 15-15: Prefer mapfile or read -a to split command output (or quote to avoid splitting).

(SC2207)

[error] 18-18: Argument mixes string and array. Use * or separate argument.

(SC2145)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build
GitHub Check: deploy
GitHub Check: build
GitHub Check: build

🔇 Additional comments (1)

.github/workflows/server-web-api-test.yml (1)

19-19: LGTM: Workflow integration looks correct.

The changes properly integrate the new test step and configure the server to load all mapped languages:

Line 31: MAVEN_OPTS="-Daa.languages=@mappings" enables dynamic language discovery including Wikidata

Lines 54-57: New test step validates statistics and redirects endpoints for all discovered languages

Step placement is correct: runs after coordinate test, before server shutdown

The workflow will now validate that Wikidata and other language endpoints are operational as part of the CI pipeline.

Also applies to: 31-31, 54-57

coderabbitai · 2025-10-24T18:32:12Z

scripts/src/main/bash/stats-redirects-test.sh

+MAPPINGS_PAGE=$(curl -s "${BASE_URL}/mappings/")
+[ -z "$MAPPINGS_PAGE" ] && { echo "❌ ERROR: Could not fetch mappings page"; exit 1; }
+
+LANGUAGES=($(echo "$MAPPINGS_PAGE" | grep -oP 'href="\K[^/"]+(?=/">)' | sort -u))


⚠️ Potential issue | 🟡 Minor

Fix shellcheck SC2207: Properly handle array assignment from command output.

The unquoted command substitution in array assignment can cause word-splitting issues. Use mapfile for more robust handling.

-LANGUAGES=($(echo "$MAPPINGS_PAGE" | grep -oP 'href="\K[^/"]+(?=/">)' | sort -u)) +mapfile -t LANGUAGES < <(echo "$MAPPINGS_PAGE" | grep -oP 'href="\K[^/"]+(?=/">)' | sort -u)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

LANGUAGES=($(echo "$MAPPINGS_PAGE" | grep -oP 'href="\K[^/"]+(?=/">)' | sort -u))

mapfile -t LANGUAGES < <(echo "$MAPPINGS_PAGE" | grep -oP 'href="\K[^/"]+(?=/">)' | sort -u)

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 15-15: Prefer mapfile or read -a to split command output (or quote to avoid splitting).

(SC2207)

🤖 Prompt for AI Agents

In scripts/src/main/bash/stats-redirects-test.sh around line 15, the array assignment using unquoted command substitution can trigger SC2207 and causes word-splitting; replace the current LANGUAGES=($(...)) pattern with a mapfile approach that reads the pipeline output (echoing MAPPINGS_PAGE piped to grep -oP ... and sort -u) into the LANGUAGES array using mapfile -t from a process substitution, and ensure subsequent uses of LANGUAGES are properly quoted; this removes word-splitting, handles entries with spaces safely, and satisfies ShellCheck SC2207.

coderabbitai · 2025-10-24T18:32:12Z

scripts/src/main/bash/stats-redirects-test.sh

+LANGUAGES=($(echo "$MAPPINGS_PAGE" | grep -oP 'href="\K[^/"]+(?=/">)' | sort -u))
+[ ${#LANGUAGES[@]} -eq 0 ] && { echo "❌ ERROR: No languages found"; exit 1; }
+
+echo "Found ${#LANGUAGES[@]} languages: ${LANGUAGES[@]}"


⚠️ Potential issue | 🟡 Minor

Fix shellcheck SC2145: Use proper array expansion syntax.

When expanding an array variable in a string context, use the @ descriptor with proper quoting to ensure each element is treated as a separate word.

-echo "Found ${#LANGUAGES[@]} languages: ${LANGUAGES[@]}" +echo "Found ${#LANGUAGES[@]} languages: ${LANGUAGES[*]}"

Note: ${LANGUAGES[*]} joins array elements with spaces (more readable for output), while ${LANGUAGES[@]} expands each element as a separate word. For this echo statement, either is acceptable, but [*] is idiomatic for display.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "Found ${#LANGUAGES[@]} languages: ${LANGUAGES[@]}"

echo "Found ${#LANGUAGES[@]} languages: ${LANGUAGES[*]}"

🧰 Tools

🪛 Shellcheck (0.11.0)

[error] 18-18: Argument mixes string and array. Use * or separate argument.

(SC2145)

🤖 Prompt for AI Agents

In scripts/src/main/bash/stats-redirects-test.sh around line 18, the echo expands the LANGUAGES array with ${LANGUAGES[@]} without proper quoting which triggers shellcheck SC2145; change the expansion to a quoted form for display such as "${LANGUAGES[*]}" (or use a separate quoted argument with "${LANGUAGES[@]}") so each element is handled correctly and the array prints as intended.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 066afc3 and f5d36b0.

📒 Files selected for processing (1)

server/server.default.properties (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: build
GitHub Check: build
GitHub Check: build
GitHub Check: deploy

🔇 Additional comments (1)

server/server.default.properties (1)

110-123: Well-structured Wikidata extractor configuration.

The Wikidata extractors are properly organized with three extraction modes (default, mappings, custom) and thoughtful comments about namespace limitations. The commented-out extractors (WikidataLexemeExtractor, WikidataPropertyExtractor) with explanation is good defensive documentation.

server/server.default.properties

Enable Wikidata as a language option in the server extraction interface by adding the required statistics file and configuration. Changes: - Add wikidata extractors to server.default.properties - Create mappingstats_wikidata.txt with initial statistics

Add Wikidata configuration with WikidataSameAsExtractor, WikidataRawExtractor, WikidataLabelExtractor, WikidataDescriptionExtractor, WikidataAliasExtractor, WikidataR2RExtractor, WikidataReferenceExtractor, WikidataNameSpaceSameAsExtractor, and WikidataLLExtractor. Return HTTP 501 for Wikidata template statistics. Initialize extractors at startup to prevent NPE

Add extractor configurations for Wikidata: - mappings mode: WikidataLabelExtractor, WikidataSameAsExtractor - custom mode: 9 extractors for comprehensive entity extraction

Add comprehensive test script to validate all Wikidata extractors in the server GUI and integrate it into the CI/CD pipeline. Changes: - Add test-wikidata-extractors.sh script that tests each Wikidata extractor individually against Barack Obama (Q76) entity - Update GitHub Actions workflow to run Wikidata extractor tests after coordinate tests Test Details: - Tests 9 Wikidata extractors: SameAs, Raw, Label, Description, Alias, R2R, Reference, NameSpaceSameAs, and LL extractors - Uses revision ID 2424202969 for reproducible results - Validates HTTP 200 response and checks for error keywords - Provides colored pass/fail output with summary - Returns proper exit codes for CI/CD integration This ensures all configured Wikidata extractors are functional when the server starts with Wikidata language support.

Fix non-parsable Wikidata dates by adding proper zero-padding to month and day values in ISO 8601 format. Before: "1961-8-4"^^xsd:date (invalid) After: "1961-08-04"^^xsd:date (valid) Changes: - Update WikidataUtil.getValue() for TimeValue case - Use Scala string formatting with %04d for year, %02d for month/day - Fixes dates in all Wikidata extractors (R2R, Raw, Reference, etc.)

sonarqubecloud · 2025-11-02T17:47:17Z

Quality Gate passed

Issues
5 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Github action minidumpdoc update

f894daf

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

server/server.default.properties Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

haniyakonain force-pushed the enable-wikidata-server branch 2 times, most recently from d80f64e to 8b343aa Compare October 16, 2025 19:30

coderabbitai bot reviewed Oct 16, 2025

View reviewed changes

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

server/server.default.properties Outdated Show resolved Hide resolved

haniyakonain marked this pull request as draft October 24, 2025 19:09

haniyakonain added 5 commits November 1, 2025 02:58

chore: add Wikidata mappings and custom mode configurations

d4f69f0

Add extractor configurations for Wikidata: - mappings mode: WikidataLabelExtractor, WikidataSameAsExtractor - custom mode: 9 extractors for comprehensive entity extraction

chore: configure server to use @mappings languages

6d57ace

fix: explicitly add wikidata to languages configuration

16b545c

haniyakonain force-pushed the enable-wikidata-server branch from 5eb208b to 16b545c Compare October 31, 2025 21:31

haniyakonain added 2 commits November 2, 2025 21:34

	LANGUAGES=($(echo "$MAPPINGS_PAGE" \| grep -oP 'href="\K[^/"]+(?=/">)' \| sort -u))
	mapfile -t LANGUAGES < <(echo "$MAPPINGS_PAGE" \| grep -oP 'href="\K[^/"]+(?=/">)' \| sort -u)

	echo "Found ${#LANGUAGES[@]} languages: ${LANGUAGES[@]}"
	echo "Found ${#LANGUAGES[@]} languages: ${LANGUAGES[*]}"

chore: enable Wikidata extraction in server GUI #792

Are you sure you want to change the base?

chore: enable Wikidata extraction in server GUI #792

Uh oh!

Conversation

haniyakonain commented Oct 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Nov 2, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haniyakonain commented Oct 6, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 6, 2025 •

edited

Loading