Skip to content

InnerProductSimilarity works wrong on local-ydb:trunk docker image #19584

@vgvoleg

Description

@vgvoleg

On langchain-ydb vectorstore we have this test

@pytest.mark.parametrize(
    "strategy",
    [
        (YDBSearchStrategy.COSINE_DISTANCE),
        (YDBSearchStrategy.COSINE_SIMILARITY),
        (YDBSearchStrategy.EUCLIDEAN_DISTANCE),
        (YDBSearchStrategy.INNER_PRODUCT_SIMILARITY),
        (YDBSearchStrategy.MANHATTAN_DISTANCE),
    ],
)
def test_different_search_strategies(strategy: YDBSearchStrategy) -> None:
    """Test end to end construction and search with specified strategy."""
    texts = ["foo", "bar", "baz"]
    config = YDBSettings(
        drop_existing_table=True,
        strategy=strategy,
    )
    config.table = "test_ydb_with_different_search_strategies"
    docsearch = YDB.from_texts(
        texts=texts,
        embedding=ConsistentFakeEmbeddings(),
        config=config,
    )

    output = docsearch.similarity_search("foo", k=1)
    assert output == [Document(page_content="foo")]

    docsearch.drop()

It was green until we switched from 24.3.13.12 docker tag to trunk (to enable ff enable_vector_index) - after switch YDBSearchStrategy.INNER_PRODUCT_SIMILARITY case is failing with assertion error

AssertionError: assert [Document(met...ontent='baz')] == [Document(met...ontent='foo')]

Select query:

DECLARE $embedding as List<Float>;

$TargetEmbedding = Knn::ToBinaryStringFloat($embedding);

SELECT
    id as id,
    document as document,
    metadata as metadata,
Knn::InnerProductSimilarity(embedding, $TargetEmbedding) as score
FROM test_ydb_with_different_search_strategies 

ORDER BY score
DESC
LIMIT 1;

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions