Skip to content

Hard Filtering on Metadata #170

Open
@insightfulquery

Description

@insightfulquery

I am using the GraphRetriever from the langchain_graph_retriever library to query AstraDB for GraphRAG, as I have seen in your docs. I need a way to only include or exclude certain records based on their metadata. For instance, I might want to limit the search to animals who's habitat is listed as 'Jungle', or I might want to do the opposite and exclude animals from Jungle habitats from a search.

The use case is that I want to load my raw text chunks, and then apply layers of metadata. For instance, community groupings, with community summaries, layer on claims as you described in your lazy GraphRAG example (although mine wouldn't be lazy), etc. Then, with this supporting structure of metadata available, I would want to be able to specify that a certain query should start by only retrieving claims, and then following them to their relevant text chunks or community summaries. This kind of flexibility would allow me to use different retrieval patterns for different types of questions in the same AstraDB.

I would envision this looking something like:

retriever = GraphRetriever(
    store=store,
    edges=edges,
    strategy=Mmr(
        lambda_mult=0.5,  # Controls diversity vs relevance (0.0 to 1.0)
        select_k=20  # Number of documents to select
    )
)

store.similarity_search(
    query,
    initial_filter={"type":"claim"},  # Start with only nodes that have a metadata field type with a value of claim
    return_filter={"type":"text_chunk"} # Return only nodes of type "text_chunk"
    k=20
)

I am having a hard time seeing how to do that in your documentation. If it is already possible, please link me to the correct doc and provide a code snippet if you would be so kind? I see a function in the source code, GraphRetriever._get_relevant_documents which has an argument called 'filter', but I can't tell how I would go about using it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions