How do you fetch data from Pinecone namespaces? For example, I have already preprocessed and upserted to pinecone, and the data is separated with namespaces. Now we want to query over the data in Pinecone with GPT-Index(llama-index):
index = GPTPineconeIndex([], pinecone_index=index_name, add_sparse_vector=True)
(data in Pinecone is separated by namespaces, unsure how to append that to GPTPineconeIndex)
AttributeError: 'str' object has no attribute 'query'
Since we can't get the data from the namespace itself, we can't query over the vector store, and it’s returning an empty object.
How do you query pinecone indexes that are separated by namespace? Or is that not possible? Do I need to write my own function to fetch the data from Pinecone?
I appreciate the response. Instead of using pinecone, is there an alternative with llama-index? The only reason I wanted the ability of namespace was for cost + efficiency with Pinecone that can sparse data from different companies on the same domain which have similar nodes to distinguish between them through metadata or namespace so it doesn’t mix up information between companies in the retrieval process.
Can I achieve this without the pinecone and llama index alone? If you can send me a couple of notebook examples of this type of process, I would appreciate it. I would love to launch my project on llama. 🙏🏼
@Obelix yeah that makes sense, the idea is to just use one pinecone index but using a namespace/metadata filter to distinguish between different types of data right? out of curiosity how big is your dataset?
Correct. Using namespace and metadata to distinguish between them because there is a lot of jargon commonly used in one single domain will make a sparse retrieval more accurate with better filtering. There are about 700,000 tokens on average of data per company, and approximately 300+ of them.