Skip to content

AsyncSearchIndex.clear() does not remove all indexed documents with paginated delete #460

@dfroger

Description

@dfroger

Description
AsyncSearchIndex.clear() attempts to remove all documents by paginating through indexed documents in batches and deleting them. However, the pagination logic is currently unstable and can result in some documents being deleted multiple times, while others may be omitted entirely.

The root cause is that pagination is performed without a SORTBY clause, so the order of documents returned by each batch is not guaranteed to be stable or unique. As a result, when there are more documents than the page_size, some documents might not be deleted at all, and others may appear on multiple pages.

Current logic (index.py, lines 1566–1568):

async for batch in self.paginate(
    FilterQuery(FilterExpression("*"), return_fields=["id"]), page_size=500
):
    ...

Why this is a problem:
Without a deterministic sort (i.e., SORTBY), RediSearch does not guarantee consistent result ordering across pages, causing duplicates and/or omissions.

Suggested solution:
Add a unique, indexed, and sortable field to your schema (for example: document_id), and update the query to paginate in a stable order using sort_by:

async for batch in self.paginate(
    FilterQuery(FilterExpression("*"), return_fields=["id"], sort_by="document_id"), page_size=500
):
    ...

This ensures every document is fetched exactly once.

I am preparing a PR for this fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions