Skip to content

feat: Support nullable vectors in Node.js SDK#482

Open
marcelo-cjl wants to merge 1 commit intomilvus-io:mainfrom
marcelo-cjl:vectornil
Open

feat: Support nullable vectors in Node.js SDK#482
marcelo-cjl wants to merge 1 commit intomilvus-io:mainfrom
marcelo-cjl:vectornil

Conversation

@marcelo-cjl
Copy link

@marcelo-cjl marcelo-cjl commented Dec 15, 2025

related: #486

  • Add null value handling in buildColumnData() for all vector types
  • Add applyValidDataToVectors() helper for sparse-to-dense data mapping
  • Update processVectorData() with valid_data check for nullable vectors
  • Add validation requiring nullable=true when adding vector fields
  • Add comprehensive tests for all 6 vector types

Summary by CodeRabbit

  • New Features

    • Added support for adding nullable vector fields to existing collections with validation enforcement.
  • Bug Fixes

    • Improved null value handling across vector data operations during insertion and retrieval.
  • Tests

    • Added comprehensive test coverage for nullable vector functionality across all supported vector types.

✏️ Tip: You can customize this high-level summary in your review settings.

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marcelo-cjl

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shanghaikid
Copy link
Contributor

shanghaikid commented Dec 15, 2025

please specify your target milvus tag in package.json , for example: 2.6-20251215-25696831-amd64, it will fetch the version and run test on it.

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>
@coderabbitai
Copy link

coderabbitai bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

This pull request adds support for nullable vector fields in Milvus collections. It introduces validation to enforce that vector fields added to existing collections must be nullable, refactors data serialization to properly handle nullable vectors across multiple types, and provides comprehensive test coverage for the new functionality.

Changes

Cohort / File(s) Summary
Error Definition
milvus/const/error.ts
Added new error code ADD_VECTOR_FIELD_MUST_BE_NULLABLE to enforce nullable requirement for vector fields in existing collections.
Collection Field Validation
milvus/grpc/Collection.ts
Added runtime validation in addCollectionField method to check that vector-type fields are nullable; throws error if nullable is not set to true. Imports convertToDataType helper.
Data Serialization & Nullability Handling
milvus/grpc/Data.ts, milvus/utils/Data.ts
Refactored insert-path data handling to support nullable vectors: per-row null/undefined checks for binary vectors, nullable-aware filtering and transformation of vector data during serialization, updated field data assignment logic, simplified validity checks, and helper function to expand sparse vector data with null placeholders based on validity bitmap.
Nullable Vector Tests
test/grpc/NullableVector.spec.ts
New comprehensive test suite validating nullable vector handling across six vector types (float, binary, float16, bfloat16, sparse_float, int8) with insert, load, query, and search operations; includes tests for adding nullable vector columns to existing collections and error handling for non-nullable vectors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Collection
    participant DataHandler
    participant Serializer
    participant Milvus

    rect rgb(200, 220, 240)
        Note over Client,Milvus: Adding Nullable Vector Field to Existing Collection
        Client->>Collection: addCollectionField(vectorField)
        Collection->>Collection: convertToDataType(field)
        Collection->>Collection: isVectorType check
        alt Vector type & not nullable
            Collection->>Client: ❌ ADD_VECTOR_FIELD_MUST_BE_NULLABLE error
        else Vector type & nullable ✓
            Collection->>Milvus: Create field schema with nullable=true
            Milvus-->>Collection: ✓ Field added
        end
    end

    rect rgb(240, 220, 200)
        Note over Client,Milvus: Inserting Data with Nullable Vectors
        Client->>DataHandler: insert(rows with nullable vectors)
        DataHandler->>DataHandler: For each row: buildFieldData(value)
        alt Vector field value is null
            DataHandler->>DataHandler: Store null at rowIndex
        else Vector field value exists
            DataHandler->>Serializer: Transform vector data
            Serializer->>Serializer: Filter nulls, apply valid_data bitmap
            Serializer->>Serializer: Build payload (buffer/array)
        end
        DataHandler->>Milvus: Send serialized data with nullable metadata
        Milvus-->>Client: ✓ Insert complete
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • #485: Modifies nullable and vector field validation logic in milvus/grpc/Data.ts, sharing overlapping data handling concerns.

Suggested labels

Review effort 2/5

Poem

🐰 Hopping through vectors, both fuzzy and clear,
Where nulls find their home without any fear,
Each type gets its due—binary, sparse, and more,
Nullable fields now unlock the door!
Hop, insert, and search—the data flows free!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Support nullable vectors in Node.js SDK' accurately describes the main change: adding nullable vector support to the Node.js SDK.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
milvus/grpc/Data.ts (1)

319-427: Wrap switch case declarations in blocks.

Multiple variables (floatVecData, f16Data, binaryData, sparseData, dim, int8Data) are declared directly in switch cases without block scoping. While this works in the current code, it violates best practices and could lead to issues if cases fall through or are reordered.

🔎 Proposed fix
         switch (field.type) {
-          case DataType.FloatVector:
+          case DataType.FloatVector: {
             const floatVecData = field.nullable
               ? (field.data as any[])
                   .filter(v => v !== null && v !== undefined)
                   .flat()
               : field.data;
             keyValue = {
               dim: field.dim,
               [dataKey]: {
                 data: floatVecData,
               },
             };
             break;
-          case DataType.BFloat16Vector:
-          case DataType.Float16Vector:
+          }
+          case DataType.BFloat16Vector:
+          case DataType.Float16Vector: {
             const f16Data = field.nullable
               ? (field.data as any[]).filter(v => v !== null && v !== undefined)
               : field.data;
             keyValue = {
               dim: field.dim,
               [dataKey]: Buffer.concat(f16Data as Uint8Array[]),
             };
             break;
-          case DataType.BinaryVector:
+          }
+          case DataType.BinaryVector: {
             const binaryData = field.nullable
               ? (field.data as any[])
                   .filter(v => v !== null && v !== undefined)
                   .flat()
               : field.data;
             keyValue = {
               dim: field.dim,
               [dataKey]: f32ArrayToBinaryBytes(binaryData as BinaryVector),
             };
             break;
-          case DataType.SparseFloatVector:
+          }
+          case DataType.SparseFloatVector: {
             const sparseData = field.nullable
               ? (field.data as any[]).filter(v => v !== null && v !== undefined)
               : field.data;
             const dim = getSparseDim(sparseData as SparseFloatVector[]);
             keyValue = {
               dim,
               [dataKey]: {
                 dim,
                 contents: sparseRowsToBytes(sparseData as SparseFloatVector[]),
               },
             };
             break;
-          case DataType.Int8Vector:
+          }
+          case DataType.Int8Vector: {
             const int8Data = field.nullable
               ? (field.data as any[]).filter(v => v !== null && v !== undefined)
               : field.data;
             keyValue = {
               dim: field.dim,
               [dataKey]: int8VectorRowsToBytes(int8Data as Int8Vector[]),
             };
             break;
+          }

Based on static analysis hints from Biome.

🧹 Nitpick comments (2)
milvus/grpc/Data.ts (1)

246-266: Consider documenting the nullable vs non-nullable storage strategy.

The code uses different storage strategies:

  • Nullable vectors: Indexed storage (field.data[rowIndex] = fieldValue)
  • Non-nullable vectors: Concatenated storage (field.data = field.data.concat(fieldValue))

This is then reconciled in buildColumnData where nullable vectors are filtered. While this works, a comment explaining the rationale would help future maintainers understand why different strategies are used.

test/grpc/NullableVector.spec.ts (1)

437-439: Consider making flush timeouts configurable or polling-based.

The hardcoded 2-second timeouts could be flaky in CI environments or under load. Consider using a polling mechanism or making the timeout configurable via environment variable.

🔎 Suggested improvement
// Add at the top of the file
const FLUSH_TIMEOUT = parseInt(process.env.FLUSH_TIMEOUT || '5000', 10);
const FLUSH_POLL_INTERVAL = 100;

// Helper function
const waitForFlush = async (
  client: MilvusClient,
  collectionName: string,
  maxWait: number = FLUSH_TIMEOUT
): Promise<void> => {
  const startTime = Date.now();
  while (Date.now() - startTime < maxWait) {
    const state = await client.getLoadingProgress({ collection_name: collectionName });
    if (state.progress === '100') {
      return;
    }
    await new Promise(resolve => setTimeout(resolve, FLUSH_POLL_INTERVAL));
  }
  throw new Error(`Flush timeout after ${maxWait}ms`);
};

// Usage
await milvusClient.flush({ collection_names: [COLLECTION_NAME] });
await waitForFlush(milvusClient, COLLECTION_NAME);

Also applies to: 536-538

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b566611 and 283d75b.

📒 Files selected for processing (5)
  • milvus/const/error.ts
  • milvus/grpc/Collection.ts
  • milvus/grpc/Data.ts
  • milvus/utils/Data.ts
  • test/grpc/NullableVector.spec.ts
🧰 Additional context used
🧬 Code graph analysis (3)
milvus/grpc/Data.ts (5)
milvus/types/DataTypes.ts (3)
  • BinaryVector (5-5)
  • SparseFloatVector (15-19)
  • Int8Vector (12-12)
milvus/const/error.ts (1)
  • ERROR_REASONS (7-66)
milvus/utils/Data.ts (1)
  • buildFieldData (344-432)
milvus/utils/Bytes.ts (3)
  • f32ArrayToBinaryBytes (37-41)
  • sparseRowsToBytes (224-230)
  • int8VectorRowsToBytes (255-261)
milvus/utils/Function.ts (1)
  • getSparseDim (124-133)
test/grpc/NullableVector.spec.ts (4)
test/tools/ip.ts (1)
  • IP (2-2)
test/tools/data.ts (4)
  • genFloatVector (71-74)
  • genBinaryVector (129-137)
  • genSparseVector (150-233)
  • genInt8Vector (235-252)
test/tools/utils.ts (1)
  • GENERATE_NAME (7-8)
milvus/const/error.ts (1)
  • ERROR_REASONS (7-66)
milvus/grpc/Collection.ts (3)
milvus/utils/Schema.ts (1)
  • convertToDataType (166-177)
milvus/utils/Validate.ts (1)
  • isVectorType (267-276)
milvus/const/error.ts (1)
  • ERROR_REASONS (7-66)
🪛 Biome (2.1.2)
milvus/grpc/Data.ts

[error] 321-325: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 335-337: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 344-348: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 355-357: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 358-358: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)


[error] 368-370: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (6)
milvus/grpc/Collection.ts (1)

243-246: LGTM! Validation enforces nullable requirement for vector fields.

The validation correctly checks if the field is a vector type and throws an appropriate error if nullable is not set to true. This prevents schema inconsistencies at field-addition time.

milvus/const/error.ts (1)

64-65: LGTM! Clear and actionable error message.

The error message provides clear guidance on the requirement and how to fix it.

milvus/grpc/Data.ts (2)

238-240: LGTM! Null checks prevent errors for nullable binary vectors.

The additional null/undefined checks ensure the dimension validation only runs for non-null values, which is correct for nullable fields.


307-310: The removal of the key !== 'vectors' check is intentional and correct for nullable vector support.

Nullable vectors now properly compute valid_data to track which rows contain null values. This is confirmed by the comprehensive test suite in test/grpc/NullableVector.spec.ts, which validates that nullable vectors across all 6 vector types (FloatVector, BinaryVector, Float16Vector, BFloat16Vector, SparseFloatVector, Int8Vector) can be inserted with null values and correctly queried/searched.

test/grpc/NullableVector.spec.ts (2)

1-169: Excellent comprehensive test setup!

The test file demonstrates thorough coverage:

  • All 6 vector types (float, binary, float16, bfloat16, sparse, int8)
  • Type-specific vector generators and comparison logic with appropriate tolerance
  • Well-organized test configuration with VECTOR_TYPES array

The vectorsEqual function correctly handles different vector formats including sparse vectors, binary vectors, and floating-point precision differences for bfloat16.


442-458: Excellent validation of nullable requirement enforcement!

The test correctly verifies that adding a non-nullable vector field throws the expected error, then successfully adds a nullable field. This ensures the runtime validation works as intended.

Comment on lines +68 to +89
const applyValidDataToVectors = (
field_data: any[],
valid_data: boolean[]
): any[] => {
if (!valid_data || valid_data.length === 0) {
return field_data;
}

const result: any[] = [];
let physicalIndex = 0;

for (let logicalIndex = 0; logicalIndex < valid_data.length; logicalIndex++) {
if (valid_data[logicalIndex]) {
result.push(field_data[physicalIndex]);
physicalIndex++;
} else {
result.push(null);
}
}

return result;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add bounds check to prevent index out-of-range errors.

The function assumes valid_data correctly represents the data, but doesn't validate that physicalIndex stays within field_data.length. If valid_data has more true values than field_data has elements, this could cause incorrect behavior.

🔎 Suggested fix
 const applyValidDataToVectors = (
   field_data: any[],
   valid_data: boolean[]
 ): any[] => {
   if (!valid_data || valid_data.length === 0) {
     return field_data;
   }
 
   const result: any[] = [];
   let physicalIndex = 0;
 
   for (let logicalIndex = 0; logicalIndex < valid_data.length; logicalIndex++) {
     if (valid_data[logicalIndex]) {
+      if (physicalIndex >= field_data.length) {
+        throw new Error(`Invalid valid_data: expected at least ${physicalIndex + 1} elements in field_data, but got ${field_data.length}`);
+      }
       result.push(field_data[physicalIndex]);
       physicalIndex++;
     } else {
       result.push(null);
     }
   }
 
   return result;
 };
🤖 Prompt for AI Agents
In milvus/utils/Data.ts around lines 68 to 89, the loop can read past field_data
when valid_data contains more true entries than field_data has elements; add a
bounds check before accessing field_data[physicalIndex] — if physicalIndex <
field_data.length then push field_data[physicalIndex] and increment
physicalIndex, otherwise push null (or optionally throw a descriptive error) to
prevent index out-of-range reads and keep output length consistent with
valid_data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants