Hi all

OObelix

Hi all,

How do you fetch data from Pinecone namespaces? For example, I have already preprocessed and upserted to pinecone, and the data is separated with namespaces. Now we want to query over the data in Pinecone with GPT-Index(llama-index):

index = GPTPineconeIndex([], pinecone_index=index_name, add_sparse_vector=True)

(data in Pinecone is separated by namespaces, unsure how to append that to GPTPineconeIndex)

response = index.query(question, vector_store_query_mode="hybrid")

Throws the following error:

AttributeError: 'str' object has no attribute 'query'

Since we can't get the data from the namespace itself, we can't query over the vector store, and it’s returning an empty object.

How do you query pinecone indexes that are separated by namespace? Or is that not possible? Do I need to write my own function to fetch the data from Pinecone?

Any help or guidance appreciated. Thank you all

8 comments

jjerryjliu0

we just added a change to specify namespaces but reverted for now

jjerryjliu0

will keep you posted! sorry about this

OObelix

I appreciate the response. Instead of using pinecone, is there an alternative with llama-index? The only reason I wanted the ability of namespace was for cost + efficiency with Pinecone that can sparse data from different companies on the same domain which have similar nodes to distinguish between them through metadata or namespace so it doesn’t mix up information between companies in the retrieval process.

Can I achieve this without the pinecone and llama index alone? If you can send me a couple of notebook examples of this type of process, I would appreciate it. I would love to launch my project on llama. 🙏🏼

jjerryjliu0

@Obelix yeah that makes sense, the idea is to just use one pinecone index but using a namespace/metadata filter to distinguish between different types of data right? out of curiosity how big is your dataset?

OObelix

Correct. Using namespace and metadata to distinguish between them because there is a lot of jargon commonly used in one single domain will make a sparse retrieval more accurate with better filtering. There are about 700,000 tokens on average of data per company, and approximately 300+ of them.

jjerryjliu0

got it. actually by the way we did add the namespace functionality back in pinecone

jjerryjliu0

https://github.com/jerryjliu/llama_index/blob/main/gpt_index/vector_stores/pinecone.py

OObelix

Super stoked to try this!

Add a reply

Find answers from the community

Hi all