Find answers from the community

Updated last year

I m struggling with a straightforward

I'm struggling with a straightforward use case. Use case: I am adding to a Pinecone VectorDB documents with specific metadata.

metadata_filters = {"document_name": document_name}
vector_store = PineconeVectorStore(
index_name=index_name,
environment=environment,
metadata_filters=metadata_filters,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
print('create storage context')
# Load the document
document = loader.load_data()

# Create index from document
document_name = VectorStoreIndex.from_documents(
document,
storage_context=storage_context,
service_context=service_context,
)
# Set summary text for document
document_name.index_struct.index_id = document_name


I want to be able to query specific documents instead of the entire index. I implemented metadata filtering, however, my response is none, even though I have checked that the metadata (with an exact match) exists. I have also checked that a response is returned when I remove all filters.

pinecone.init(api_key=PINECONE_API_KEY, environment=environment)

llm = OpenAI(temperature=0, model="gpt-3.5-turbo", max_tokens=1024)
service_context = ServiceContext.from_defaults(llm=llm)
vector_store = PineconeVectorStore(pinecone.Index("test"))
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
query_engine = index.as_query_engine(similarity_top_k=5,
service_context=service_context,
filters=MetadataFilters(
filters=[ExactMatchFilter(key='document_name', value=document_name)]
)
)
response = query_engine.query(instruction)
print(response)

What is wrong with this approach?
L
N
6 comments
Hmm, I see a few things

  1. No need to set metadata_filters in the PineconeVectorStore constructor
  1. When you load the data with the loader, double check that each document has the metadata you expect
Plain Text
documents = loader.load_data()
for doc in documents:
  print(doc.metadata)  # should print a dictionary that hopefully has the `document_name` key


  1. Not totally sure what this line is for, you can probably remove it: document_name.index_struct.index_id = document_name
Thanks Logan, i'll give it a shot troubleshooting with 2 & 3. On #1, I was following the usage pattern here: https://gpt-index.readthedocs.io/en/latest/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.html. why would I not set metadata_filters? here's the basic pattern given: # Build city document index
from llama_index.storage.storage_context import StorageContext


city_indices = {}
for pinecone_title, wiki_title in zip(pinecone_titles, wiki_titles):
metadata_filters = {"wiki_title": wiki_title}
vector_store = PineconeVectorStore(
index_name=index_name,
environment=environment,
metadata_filters=metadata_filters,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
city_indices[wiki_title] = VectorStoreIndex.from_documents(
city_docs[wiki_title],
storage_context=storage_context,
service_context=service_context,
)
# set summary text for city
city_indices[wiki_title].index_struct.index_id = pinecone_title
I think that's an outdated demo, it's not used in the init in the source code
Attachment
image.png
That's interesting and probably the root of my problem. I'm not seeing any errors from using metadata_filters=. I took a look at the docs, and I haven't been able to find a description of default metadata key:values. I do see metadata extractor classes, but I don't know how to use that to ensure that the metadata I want is extracted.
i'll use your #2 to figure out what exactly is being added as tags
Logan - you were right. In case this isn't obvious to anyone else, dropping the solve below. you can use the extractor classes to get doc metadata or create your own. Then just do this: metadata_filters = {"document_name": document_name} documents = loader.load_data()
for doc in documents:
doc.metadata = metadata_filters
print(doc.metadata)
Add a reply
Sign up and join the conversation on Discord