Find answers from the community

Updated 4 weeks ago

Creating-a-summary-index-with-specific-document-retrieval

I have a usecase, in which I need to ingest over 1000+ documents and over which I need to create a VectorIndex and SummaryIndex. I am able to sucessfully create the VectorIndex by adding metadata and retrieving using VectorIndexAutoRetriever. However, I am stuck at creating the SummaryIndex as I need to retrieve only one particular document (identified by the metadata) and create its summary. How can I achieve this?
L
d
18 comments
We've been (slowly) implementing the method vector_store.get_nodes() for some vector stores, in which you can pass in node ids or metadata filters

What vector store are you using?
There is no problem for Vector Index. I am facing issues while creating SummaryIndex.
I am creating SummaryIndex over 1000+ docs
now if the user asks for summary of doc A, it should retrieve the nodes corresponding to only doc A and then create the summary, However, I found that this is not possible with the current abstractions.
I might be worng here, hence need your guidance
Yea, the summary index always retrieves all documents in it

What you probably want is some LLM call to generate metadata filters, and retrieve from your vector store using those filters
I am still getting confused, because what I think you are trying to say is that, i should be storing the SummaryIndex in the VectorStore?
is that possible?
No need for a summary index πŸ‘€

You can use vector_store.get_nodes(filters=filters), and then pass those nodes into tree-summarize (assuming you are using a vector store that supports that function)

Plain Text
from llama_index.core.response_synthesizers import TreeSummarize

synth = TreeSummarize(llm=llm)

nodes = vector_store.get_nodes(filters=filters)

response_str = synth.get_response("query", [node.text for node in nodes])
Thanks for this answr logan, I am using AzureAIVectorStore, which I think doesn't implement the get_nodes() will try to do a PR for it
also I wanted to know if we have some abstractions that will generate Metadata filters from the query?
You can use the structured_predict method to define an object, and use that to fill out the filters.

This example assumes just exact match
Plain Text
from llama_index.core.vector_stores import MetadataFilters, MetadataFilter
from pydantic import BaseModel, Field

class Filter(BaseModel):
  """A filter on metadata."""
  key: str = Field(description="The key name to filter on")
  value: str = Field(description="The value to match on.")

class Filters(BaseModel):
  """A list of metadata filters for a query."""
  filters: list[Filter]

sllm = llm.as_structured_llm(Filters)

response = sllm.complete(f"I have an index with metadata like <some examples>. Given a user query, generate some filters (if any) that can be used to help narrow down the search.\n\n{user_query}")

filters = Filters.model_validate_json(str(response))

metadata_filters = []
for filter in filters:
  metadata_filters.append(MetadataFilter(key=filter.key, value=filter.value))

nodes = vector_store.get_nodes(filters=MetadataFilters(filters=filters))
(I did not test that lol)
Hi @Logan M - a PR for this has been raised, but still not merged, would like to know if anything is pending
Ah yea it's buried in the mountain of PRs ⛰️ thanks for the bump
no issues πŸ™‚
happy to help
Add a reply
Sign up and join the conversation on Discord