Skip to content

RandomAccessScoreProvider with MapRandomAccessVectorValues + non-sequential IDs produces wrong centroid #354

@vbekiaris

Description

@vbekiaris

In 97e523c approximateCentroid() implementation for the BuildScoreProvider returned from BuildScoreProvider.randomAccessScoreProvider() was updated to allow for non-sequential node IDs.

However the iteration only takes into account nodes with ID < ravv.size(). This means that if there are actually "holes" in the ID sequence (e.g. add 100 nodes in a MapRandomAccessVectorValues, then remove 10 starting from 0), then some nodes (those with ID >= 90 in the example) will not be taken into account while calculating the centroid.

A fix would probably require changing RandomAccessVectorValues API to expose an iterator or the highest nodeId that is set (or something similar).

Test that demonstrates the issue:

    @Test
    void testRandomAccessScoreProvider() {
        Map<Integer, VectorFloat<?>> chm = new ConcurrentHashMap<>();
        // nodeId's 0..49 have vector [0, 1], 49..99 are [1, 0], so centroid is [0.5, 0.5]
        for (int i = 0; i < 100; i++) {
            chm.put(i, createVector(i < 50));
        }
        RandomAccessVectorValues ravv = new MapRandomAccessVectorValues(chm, 2);
        var bsp = BuildScoreProvider.randomAccessScoreProvider(ravv, VectorSimilarityFunction.COSINE);
        float[] centroid = (float[]) bsp.approximateCentroid().get();
        Assertions.assertArrayEquals(new float[] {0.5f, 0.5f}, centroid);
        // remove even nodeId's. So we have just 50 nodeIds left now and they are no longer sequential
        // centroid however should stay the same at [0.5, 0.5], since we have 25 [0, 1] and 25 [1, 0] vectors
        for (int i = 0; i < 100; i = i+2) {
            chm.remove(i);
        }
        centroid = (float[]) bsp.approximateCentroid().get();
        // fails - calculated centroid is [0, 0.5] because centroid calculation only took into account node IDs < 50
        Assertions.assertArrayEquals(new float[] {0.5f, 0.5f}, centroid);
    }

    VectorFloat<?> createVector(boolean vertical) {
        VectorFloat<?> vectorFloat = VectorizationProvider.getInstance().getVectorTypeSupport().createFloatVector(2);
        if (vertical) {
            vectorFloat.set(0, 0);
            vectorFloat.set(1, 1);
        } else {
            vectorFloat.set(0, 1);
            vectorFloat.set(1, 0);
        }
        return vectorFloat;
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions