-
Notifications
You must be signed in to change notification settings - Fork 358
Development: Insert lecture metadata into Weaviate
#12021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Development: Insert lecture metadata into Weaviate
#12021
Conversation
…-metadata-insertion # Conflicts: # build.gradle # docker/weaviate.yml # documentation/docs/developer/weaviate-setup.mdx # documentation/sidebar-staff.ts # gradle.properties
|
@florian-glombik Your PR description needs attention before it can be reviewed: Issues Found
How to Fix
This check validates that your PR description follows the PR template. A complete description helps reviewers understand your changes and speeds up the review process.
|
|
@florian-glombik Test coverage could not be fully measured because some tests failed. Please check the workflow logs for details. |
WalkthroughThis PR introduces Weaviate vector database integration for the Artemis platform, including Spring configuration for conditional client initialization, schema definitions for five collections (Lectures, LectureTranscriptions, LectureUnitSegments, LectureUnits, Faqs) mirroring Iris schemas, a service managing collection creation and data operations, and supporting infrastructure. Changes
Sequence Diagram(s)sequenceDiagram
participant App as Artemis App Startup
participant Config as WeaviateClientConfiguration
participant Client as WeaviateClient
participant Server as Weaviate Server
participant Service as WeaviateService
App->>Config: Initialize Spring beans (artemis.weaviate.enabled=true)
Config->>Client: Create WeaviateClient with config (host, port, secure)
Client->>Server: Connect to Weaviate instance
Server-->>Client: Connection established
Client-->>Config: WeaviateClient bean ready
Config-->>App: Configuration complete
App->>Service: Instantiate WeaviateService
Service->>Service: `@PostConstruct`: initializeCollections()
loop For each schema in ALL_SCHEMAS
Service->>Server: Check if collection exists
alt Collection missing
Service->>Server: Create collection with properties and references
Server-->>Service: Collection created
else Collection exists
Server-->>Service: Confirmed
end
end
Service-->>App: Collections initialized
App->>Service: isHealthy()
Service->>Server: List collections
Server-->>Service: Success/Failure
Service-->>App: Health status
sequenceDiagram
participant Client as Application Code
participant Service as WeaviateService
participant Schema as WeaviateSchemas
participant Weaviate as Weaviate Server
Client->>Service: insertLecturePageChunk(courseId, ..., pageTextContent, ...)
Service->>Schema: getSchema("Lectures")
Schema-->>Service: LECTURES_SCHEMA with properties
Service->>Weaviate: getCollection("Lectures")
Weaviate-->>Service: CollectionHandle
Service->>Weaviate: Insert document with mapped properties
Weaviate-->>Service: Document inserted
Service-->>Client: Operation complete
Client->>Service: insertFaq(courseId, ..., questionTitle, questionAnswer)
Service->>Schema: getSchema("Faqs")
Schema-->>Service: FAQS_SCHEMA with properties
Service->>Weaviate: getCollection("Faqs")
Weaviate-->>Service: CollectionHandle
Service->>Weaviate: Insert FAQ document
Weaviate-->>Service: Document inserted
Service-->>Client: Operation complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@src/main/java/de/tum/cit/aet/artemis/core/service/weaviate/WeaviateService.java`:
- Around line 131-140: insertLecturePageChunk currently builds the payload with
Map.of which throws on any null value (e.g., baseUrl or other nullable fields);
change the code to validate required parameters up-front and construct the
insert map using a mutable builder (e.g., new HashMap<>() / Map.ofEntries
alternative) so you only put optional fields when non-null, then pass that
sanitized map to collection.data.insert; apply the same pattern to the other
insert methods referenced (lines ~161-169) to avoid NPEs from Map.of.
🧹 Nitpick comments (9)
src/main/java/de/tum/cit/aet/artemis/core/config/weaviate/schema/WeaviateReferenceDefinition.java (1)
21-23: Consider removing redundant factory method or adding validation.The
of()factory method simply delegates to the canonical constructor without adding any value. Either remove it and use the constructor directly, or add input validation to justify its existence.Option 1: Remove redundant factory method
- /** - * Creates a reference definition. - * - * `@param` name the reference name - * `@param` targetCollection the target collection name - * `@param` description the description - * `@return` the reference definition - */ - public static WeaviateReferenceDefinition of(String name, String targetCollection, String description) { - return new WeaviateReferenceDefinition(name, targetCollection, description); - }Option 2: Add validation to justify factory method
public static WeaviateReferenceDefinition of(String name, String targetCollection, String description) { + if (name == null || name.isBlank()) { + throw new IllegalArgumentException("Reference name must not be null or blank"); + } + if (targetCollection == null || targetCollection.isBlank()) { + throw new IllegalArgumentException("Target collection must not be null or blank"); + } return new WeaviateReferenceDefinition(name, targetCollection, description); }Based on learnings, input validation is recommended for server-side Java code.
src/main/java/de/tum/cit/aet/artemis/core/config/weaviate/schema/WeaviateCollectionSchema.java (2)
16-38: Defensively copy/validate schema lists to avoid accidental mutation.The record stores the provided Lists as-is; if callers mutate them later, schema lookups can drift. A compact constructor can copy and null-check once.
♻️ Proposed fix
public record WeaviateCollectionSchema(String collectionName, List<WeaviatePropertyDefinition> properties, List<WeaviateReferenceDefinition> references) { + + public WeaviateCollectionSchema { + java.util.Objects.requireNonNull(collectionName, "collectionName"); + properties = List.copyOf(java.util.Objects.requireNonNull(properties, "properties")); + references = List.copyOf(java.util.Objects.requireNonNull(references, "references")); + }
74-75: Provide clearer feedback on duplicate property names.
Collectors.toMapthrows anIllegalStateExceptionon duplicates; a merge function can fail fast with a targeted message.♻️ Proposed fix
- return properties.stream().collect(Collectors.toMap(WeaviatePropertyDefinition::name, p -> p)); + return properties.stream().collect(Collectors.toMap( + WeaviatePropertyDefinition::name, + p -> p, + (a, b) -> { + throw new IllegalArgumentException("Duplicate property name: " + a.name()); + } + ));src/main/java/de/tum/cit/aet/artemis/core/config/weaviate/schema/WeaviatePropertyDefinition.java (1)
14-50: Add basic null/blank validation for schema definitions.Fail fast on invalid names/data types to avoid obscure errors during schema creation.
Based on learnings, validate inputs early in server-side code.♻️ Proposed fix
public record WeaviatePropertyDefinition(String name, WeaviateDataType dataType, boolean indexSearchable, boolean indexFilterable, String description) { + + public WeaviatePropertyDefinition { + if (name == null || name.isBlank()) { + throw new IllegalArgumentException("name must not be blank"); + } + java.util.Objects.requireNonNull(dataType, "dataType"); + description = (description == null) ? "" : description; + }src/main/java/de/tum/cit/aet/artemis/core/config/weaviate/WeaviateConfigurationProperties.java (1)
21-73: Validate host/port values to fail fast on misconfiguration.Blank hosts or invalid ports currently pass through and only fail later during connection setup. Guard clauses keep failures actionable.
Based on learnings, validate inputs early in server-side code.♻️ Proposed fix
public void setHost(String host) { - this.host = host; + if (host == null || host.isBlank()) { + throw new IllegalArgumentException("artemis.weaviate.host must not be blank"); + } + this.host = host; } @@ public void setPort(int port) { - this.port = port; + if (port <= 0 || port > 65535) { + throw new IllegalArgumentException("artemis.weaviate.port must be between 1 and 65535"); + } + this.port = port; } @@ public void setGrpcPort(int grpcPort) { - this.grpcPort = grpcPort; + if (grpcPort <= 0 || grpcPort > 65535) { + throw new IllegalArgumentException("artemis.weaviate.grpc-port must be between 1 and 65535"); + } + this.grpcPort = grpcPort; }PR_DESCRIPTION.md (2)
11-11: Consider replacing “very good performance” to satisfy style checks.✏️ Proposed fix
- - [ ] **Important**: I implemented the changes with a [very good performance](https://docs.artemis.cit.tum.de/dev/guidelines/performance/) and prevented too many (unnecessary) and too complex database calls. + - [ ] **Important**: I implemented the changes with [excellent performance](https://docs.artemis.cit.tum.de/dev/guidelines/performance/) and prevented too many (unnecessary) and too complex database calls. @@ - - [ ] I (as a reviewer) confirm that the server changes (in particular related to database calls) are implemented with a very good performance even for very large courses with more than 2000 students. + - [ ] I (as a reviewer) confirm that the server changes (in particular related to database calls) are implemented with excellent performance even for very large courses with more than 2000 students.Also applies to: 121-121
90-110: Use headings instead of bolded emphasis for test sections (MD036).✏️ Proposed fix
-**Test 1: Basic Startup with Weaviate Enabled** +#### Test 1: Basic Startup with Weaviate Enabled @@ -**Test 2: Schema Validation Failure (Strict Mode)** +#### Test 2: Schema Validation Failure (Strict Mode) @@ -**Test 3: Schema Validation Warning (Non-Strict Mode)** +#### Test 3: Schema Validation Warning (Non-Strict Mode) @@ -**Test 4: Disabled Weaviate** +#### Test 4: Disabled Weaviatesrc/main/java/de/tum/cit/aet/artemis/core/config/weaviate/WeaviateClientConfiguration.java (1)
34-52: AdddestroyMethod = "close"to the@Beanannotation to ensure the Weaviate client closes its connections on shutdown.The
WeaviateClientimplementsAutoCloseableand itsclose()method properly closes both the REST transport and gRPC channels. Wire it as a destroy method in the bean definition to prevent resource leaks.♻️ Proposed fix
- `@Bean` + `@Bean`(destroyMethod = "close") public WeaviateClient weaviateClient() {src/main/java/de/tum/cit/aet/artemis/core/config/weaviate/schema/WeaviateSchemas.java (1)
295-303: Consider returningOptional<WeaviateCollectionSchema>instead of nullable.Returning
Optionalinstead of null would make the API safer and more explicit about the possibility of missing schemas, following modern Java conventions.♻️ Suggested improvement
+import java.util.Optional; + /** * Gets a schema by collection name. * * `@param` collectionName the collection name - * `@return` the schema, or null if not found + * `@return` the schema wrapped in Optional, or empty if not found */ -public static WeaviateCollectionSchema getSchema(String collectionName) { - return SCHEMAS_BY_NAME.get(collectionName); +public static Optional<WeaviateCollectionSchema> getSchema(String collectionName) { + return Optional.ofNullable(SCHEMAS_BY_NAME.get(collectionName)); }
| public void insertLecturePageChunk(long courseId, String courseLanguage, long lectureId, long lectureUnitId, String pageTextContent, int pageNumber, String baseUrl, | ||
| int attachmentVersion) { | ||
|
|
||
| var collection = getCollection(WeaviateSchemas.LECTURES_COLLECTION); | ||
|
|
||
| try { | ||
| collection.data.insert(Map.of(WeaviateSchemas.LecturesProperties.COURSE_ID, courseId, WeaviateSchemas.LecturesProperties.COURSE_LANGUAGE, courseLanguage, | ||
| WeaviateSchemas.LecturesProperties.LECTURE_ID, lectureId, WeaviateSchemas.LecturesProperties.LECTURE_UNIT_ID, lectureUnitId, | ||
| WeaviateSchemas.LecturesProperties.PAGE_TEXT_CONTENT, pageTextContent, WeaviateSchemas.LecturesProperties.PAGE_NUMBER, pageNumber, | ||
| WeaviateSchemas.LecturesProperties.BASE_URL, baseUrl, WeaviateSchemas.LecturesProperties.ATTACHMENT_VERSION, attachmentVersion)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against nulls before Map.of insertion.
Map.of rejects null values; any nullable field (e.g., courseDescription or baseUrl) will throw an NPE and skip insertion. Consider validating required fields explicitly and conditionally including optional ones.
🐛 Proposed fix (pattern)
- collection.data.insert(Map.of(WeaviateSchemas.LecturesProperties.COURSE_ID, courseId, WeaviateSchemas.LecturesProperties.COURSE_LANGUAGE, courseLanguage,
- WeaviateSchemas.LecturesProperties.LECTURE_ID, lectureId, WeaviateSchemas.LecturesProperties.LECTURE_UNIT_ID, lectureUnitId,
- WeaviateSchemas.LecturesProperties.PAGE_TEXT_CONTENT, pageTextContent, WeaviateSchemas.LecturesProperties.PAGE_NUMBER, pageNumber,
- WeaviateSchemas.LecturesProperties.BASE_URL, baseUrl, WeaviateSchemas.LecturesProperties.ATTACHMENT_VERSION, attachmentVersion));
+ Map<String, Object> payload = new java.util.HashMap<>();
+ payload.put(WeaviateSchemas.LecturesProperties.COURSE_ID, courseId);
+ payload.put(WeaviateSchemas.LecturesProperties.COURSE_LANGUAGE, courseLanguage);
+ payload.put(WeaviateSchemas.LecturesProperties.LECTURE_ID, lectureId);
+ payload.put(WeaviateSchemas.LecturesProperties.LECTURE_UNIT_ID, lectureUnitId);
+ payload.put(WeaviateSchemas.LecturesProperties.PAGE_TEXT_CONTENT, pageTextContent);
+ payload.put(WeaviateSchemas.LecturesProperties.PAGE_NUMBER, pageNumber);
+ if (baseUrl != null) {
+ payload.put(WeaviateSchemas.LecturesProperties.BASE_URL, baseUrl);
+ }
+ payload.put(WeaviateSchemas.LecturesProperties.ATTACHMENT_VERSION, attachmentVersion);
+ collection.data.insert(payload);
@@
- collection.data.insert(Map.of(WeaviateSchemas.FaqsProperties.COURSE_ID, courseId, WeaviateSchemas.FaqsProperties.COURSE_NAME, courseName,
- WeaviateSchemas.FaqsProperties.COURSE_DESCRIPTION, courseDescription, WeaviateSchemas.FaqsProperties.COURSE_LANGUAGE, courseLanguage,
- WeaviateSchemas.FaqsProperties.FAQ_ID, faqId, WeaviateSchemas.FaqsProperties.QUESTION_TITLE, questionTitle, WeaviateSchemas.FaqsProperties.QUESTION_ANSWER,
- questionAnswer));
+ Map<String, Object> payload = new java.util.HashMap<>();
+ payload.put(WeaviateSchemas.FaqsProperties.COURSE_ID, courseId);
+ payload.put(WeaviateSchemas.FaqsProperties.COURSE_NAME, courseName);
+ if (courseDescription != null) {
+ payload.put(WeaviateSchemas.FaqsProperties.COURSE_DESCRIPTION, courseDescription);
+ }
+ payload.put(WeaviateSchemas.FaqsProperties.COURSE_LANGUAGE, courseLanguage);
+ payload.put(WeaviateSchemas.FaqsProperties.FAQ_ID, faqId);
+ payload.put(WeaviateSchemas.FaqsProperties.QUESTION_TITLE, questionTitle);
+ payload.put(WeaviateSchemas.FaqsProperties.QUESTION_ANSWER, questionAnswer);
+ collection.data.insert(payload);Also applies to: 161-169
🤖 Prompt for AI Agents
In
`@src/main/java/de/tum/cit/aet/artemis/core/service/weaviate/WeaviateService.java`
around lines 131 - 140, insertLecturePageChunk currently builds the payload with
Map.of which throws on any null value (e.g., baseUrl or other nullable fields);
change the code to validate required parameters up-front and construct the
insert map using a mutable builder (e.g., new HashMap<>() / Map.ofEntries
alternative) so you only put optional fields when non-null, then pass that
sanitized map to collection.data.insert; apply the same pattern to the other
insert methods referenced (lines ~161-169) to avoid NPEs from Map.of.
Development: Insert lecture metadata into Weaviate
End-to-End (E2E) Test Results Summary
|
||||||||||||||||||
Checklist
General
Server
Client
authoritiesto all new routes and checked the course groups for displaying navigation elements (links, buttons).Changes affecting Programming Exercises
Motivation and Context
Description
Steps for Testing
Prerequisites:
Exam Mode Testing
Prerequisites:
Testserver States
You can manage test servers using Helios. Check environment statuses in the environment list. To deploy to a test server, go to the CI/CD page, find your PR or branch, and trigger the deployment.
Review Progress
Performance Review
Code Review
Manual Tests
Exam Mode Test
Performance Tests
Test Coverage
Warning: Server tests failed. Coverage could not be fully measured. Please check the workflow logs.
Last updated: 2026-01-24 17:21:47 UTC
Screenshots
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.