Description
I am using the GraphRetriever from the langchain_graph_retriever library to query AstraDB for GraphRAG, as I have seen in your docs. I need a way to only include or exclude certain records based on their metadata. For instance, I might want to limit the search to animals who's habitat is listed as 'Jungle', or I might want to do the opposite and exclude animals from Jungle habitats from a search.
The use case is that I want to load my raw text chunks, and then apply layers of metadata. For instance, community groupings, with community summaries, layer on claims as you described in your lazy GraphRAG example (although mine wouldn't be lazy), etc. Then, with this supporting structure of metadata available, I would want to be able to specify that a certain query should start by only retrieving claims, and then following them to their relevant text chunks or community summaries. This kind of flexibility would allow me to use different retrieval patterns for different types of questions in the same AstraDB.
I would envision this looking something like:
retriever = GraphRetriever(
store=store,
edges=edges,
strategy=Mmr(
lambda_mult=0.5, # Controls diversity vs relevance (0.0 to 1.0)
select_k=20 # Number of documents to select
)
)
store.similarity_search(
query,
initial_filter={"type":"claim"}, # Start with only nodes that have a metadata field type with a value of claim
return_filter={"type":"text_chunk"} # Return only nodes of type "text_chunk"
k=20
)
I am having a hard time seeing how to do that in your documentation. If it is already possible, please link me to the correct doc and provide a code snippet if you would be so kind? I see a function in the source code, GraphRetriever._get_relevant_documents which has an argument called 'filter', but I can't tell how I would go about using it.