feat: Support nullable vectors in Node.js SDK by marcelo-cjl · Pull Request #482 · milvus-io/milvus-sdk-node

marcelo-cjl · 2025-12-15T07:14:16Z

related: #486

Add null value handling in buildColumnData() for all vector types
Add applyValidDataToVectors() helper for sparse-to-dense data mapping
Update processVectorData() with valid_data check for nullable vectors
Add validation requiring nullable=true when adding vector fields
Add comprehensive tests for all 6 vector types

Summary by CodeRabbit

New Features
- Added support for adding nullable vector fields to existing collections with validation enforcement.
Bug Fixes
- Improved null value handling across vector data operations during insertion and retrieval.
Tests
- Added comprehensive test coverage for nullable vector functionality across all supported vector types.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

sre-ci-robot · 2025-12-15T07:14:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marcelo-cjl

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shanghaikid · 2025-12-15T07:16:43Z

please specify your target milvus tag in package.json , for example: 2.6-20251215-25696831-amd64, it will fetch the version and run test on it.

Signed-off-by: marcelo-cjl <marcelo.chen@zilliz.com>

coderabbitai · 2025-12-24T06:44:46Z

📝 Walkthrough

Walkthrough

This pull request adds support for nullable vector fields in Milvus collections. It introduces validation to enforce that vector fields added to existing collections must be nullable, refactors data serialization to properly handle nullable vectors across multiple types, and provides comprehensive test coverage for the new functionality.

Changes

Cohort / File(s)	Summary
Error Definition `milvus/const/error.ts`	Added new error code `ADD_VECTOR_FIELD_MUST_BE_NULLABLE` to enforce nullable requirement for vector fields in existing collections.
Collection Field Validation `milvus/grpc/Collection.ts`	Added runtime validation in `addCollectionField` method to check that vector-type fields are nullable; throws error if nullable is not set to true. Imports `convertToDataType` helper.
Data Serialization & Nullability Handling `milvus/grpc/Data.ts`, `milvus/utils/Data.ts`	Refactored insert-path data handling to support nullable vectors: per-row null/undefined checks for binary vectors, nullable-aware filtering and transformation of vector data during serialization, updated field data assignment logic, simplified validity checks, and helper function to expand sparse vector data with null placeholders based on validity bitmap.
Nullable Vector Tests `test/grpc/NullableVector.spec.ts`	New comprehensive test suite validating nullable vector handling across six vector types (float, binary, float16, bfloat16, sparse_float, int8) with insert, load, query, and search operations; includes tests for adding nullable vector columns to existing collections and error handling for non-nullable vectors.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Collection
    participant DataHandler
    participant Serializer
    participant Milvus

    rect rgb(200, 220, 240)
        Note over Client,Milvus: Adding Nullable Vector Field to Existing Collection
        Client->>Collection: addCollectionField(vectorField)
        Collection->>Collection: convertToDataType(field)
        Collection->>Collection: isVectorType check
        alt Vector type & not nullable
            Collection->>Client: ❌ ADD_VECTOR_FIELD_MUST_BE_NULLABLE error
        else Vector type & nullable ✓
            Collection->>Milvus: Create field schema with nullable=true
            Milvus-->>Collection: ✓ Field added
        end
    end

    rect rgb(240, 220, 200)
        Note over Client,Milvus: Inserting Data with Nullable Vectors
        Client->>DataHandler: insert(rows with nullable vectors)
        DataHandler->>DataHandler: For each row: buildFieldData(value)
        alt Vector field value is null
            DataHandler->>DataHandler: Store null at rowIndex
        else Vector field value exists
            DataHandler->>Serializer: Transform vector data
            Serializer->>Serializer: Filter nulls, apply valid_data bitmap
            Serializer->>Serializer: Build payload (buffer/array)
        end
        DataHandler->>Milvus: Send serialized data with nullable metadata
        Milvus-->>Client: ✓ Insert complete
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

#485: Modifies nullable and vector field validation logic in milvus/grpc/Data.ts, sharing overlapping data handling concerns.

Suggested labels

Review effort 2/5

Poem

🐰 Hopping through vectors, both fuzzy and clear,
Where nulls find their home without any fear,
Each type gets its due—binary, sparse, and more,
Nullable fields now unlock the door!
Hop, insert, and search—the data flows free! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Support nullable vectors in Node.js SDK' accurately describes the main change: adding nullable vector support to the Node.js SDK.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

milvus/grpc/Data.ts (1)

319-427: Wrap switch case declarations in blocks.

Multiple variables (floatVecData, f16Data, binaryData, sparseData, dim, int8Data) are declared directly in switch cases without block scoping. While this works in the current code, it violates best practices and could lead to issues if cases fall through or are reordered.

🔎 Proposed fix

         switch (field.type) {
-          case DataType.FloatVector:
+          case DataType.FloatVector: {
             const floatVecData = field.nullable
               ? (field.data as any[])
                   .filter(v => v !== null && v !== undefined)
                   .flat()
               : field.data;
             keyValue = {
               dim: field.dim,
               [dataKey]: {
                 data: floatVecData,
               },
             };
             break;
-          case DataType.BFloat16Vector:
-          case DataType.Float16Vector:
+          }
+          case DataType.BFloat16Vector:
+          case DataType.Float16Vector: {
             const f16Data = field.nullable
               ? (field.data as any[]).filter(v => v !== null && v !== undefined)
               : field.data;
             keyValue = {
               dim: field.dim,
               [dataKey]: Buffer.concat(f16Data as Uint8Array[]),
             };
             break;
-          case DataType.BinaryVector:
+          }
+          case DataType.BinaryVector: {
             const binaryData = field.nullable
               ? (field.data as any[])
                   .filter(v => v !== null && v !== undefined)
                   .flat()
               : field.data;
             keyValue = {
               dim: field.dim,
               [dataKey]: f32ArrayToBinaryBytes(binaryData as BinaryVector),
             };
             break;
-          case DataType.SparseFloatVector:
+          }
+          case DataType.SparseFloatVector: {
             const sparseData = field.nullable
               ? (field.data as any[]).filter(v => v !== null && v !== undefined)
               : field.data;
             const dim = getSparseDim(sparseData as SparseFloatVector[]);
             keyValue = {
               dim,
               [dataKey]: {
                 dim,
                 contents: sparseRowsToBytes(sparseData as SparseFloatVector[]),
               },
             };
             break;
-          case DataType.Int8Vector:
+          }
+          case DataType.Int8Vector: {
             const int8Data = field.nullable
               ? (field.data as any[]).filter(v => v !== null && v !== undefined)
               : field.data;
             keyValue = {
               dim: field.dim,
               [dataKey]: int8VectorRowsToBytes(int8Data as Int8Vector[]),
             };
             break;
+          }

Based on static analysis hints from Biome.

🧹 Nitpick comments (2)

milvus/grpc/Data.ts (1)

246-266: Consider documenting the nullable vs non-nullable storage strategy.

The code uses different storage strategies:

Nullable vectors: Indexed storage (field.data[rowIndex] = fieldValue)

Non-nullable vectors: Concatenated storage (field.data = field.data.concat(fieldValue))

This is then reconciled in buildColumnData where nullable vectors are filtered. While this works, a comment explaining the rationale would help future maintainers understand why different strategies are used.
test/grpc/NullableVector.spec.ts (1)
437-439: Consider making flush timeouts configurable or polling-based.

The hardcoded 2-second timeouts could be flaky in CI environments or under load. Consider using a polling mechanism or making the timeout configurable via environment variable.
🔎 Suggested improvement
// Add at the top of the file
const FLUSH_TIMEOUT = parseInt(process.env.FLUSH_TIMEOUT || '5000', 10);
const FLUSH_POLL_INTERVAL = 100;

// Helper function
const waitForFlush = async (
  client: MilvusClient,
  collectionName: string,
  maxWait: number = FLUSH_TIMEOUT
): Promise<void> => {
  const startTime = Date.now();
  while (Date.now() - startTime < maxWait) {
    const state = await client.getLoadingProgress({ collection_name: collectionName });
    if (state.progress === '100') {
      return;
    }
    await new Promise(resolve => setTimeout(resolve, FLUSH_POLL_INTERVAL));
  }
  throw new Error(`Flush timeout after ${maxWait}ms`);
};

// Usage
await milvusClient.flush({ collection_names: [COLLECTION_NAME] });
await waitForFlush(milvusClient, COLLECTION_NAME);
Also applies to: 536-538

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b566611 and 283d75b.

📒 Files selected for processing (5)

milvus/const/error.ts
milvus/grpc/Collection.ts
milvus/grpc/Data.ts
milvus/utils/Data.ts
test/grpc/NullableVector.spec.ts

🧰 Additional context used

🧬 Code graph analysis (3)

milvus/grpc/Data.ts (5)

milvus/types/DataTypes.ts (3)

BinaryVector (5-5)

SparseFloatVector (15-19)

Int8Vector (12-12)

milvus/const/error.ts (1)

ERROR_REASONS (7-66)

milvus/utils/Data.ts (1)

buildFieldData (344-432)

milvus/utils/Bytes.ts (3)

f32ArrayToBinaryBytes (37-41)

sparseRowsToBytes (224-230)

int8VectorRowsToBytes (255-261)

milvus/utils/Function.ts (1)

getSparseDim (124-133)

test/grpc/NullableVector.spec.ts (4)

test/tools/ip.ts (1)

IP (2-2)

test/tools/data.ts (4)

genFloatVector (71-74)

genBinaryVector (129-137)

genSparseVector (150-233)

genInt8Vector (235-252)

test/tools/utils.ts (1)

GENERATE_NAME (7-8)

milvus/const/error.ts (1)

ERROR_REASONS (7-66)

milvus/grpc/Collection.ts (3)

milvus/utils/Schema.ts (1)

convertToDataType (166-177)

milvus/utils/Validate.ts (1)

isVectorType (267-276)

milvus/const/error.ts (1)

ERROR_REASONS (7-66)

🪛 Biome (2.1.2)

milvus/grpc/Data.ts

[error] 321-325: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.