Find answers from the community

Updated 3 months ago

hi llamatonians anyone got

hi llamatonians, anyone got pineconevectorstore working? I ran the init() with my api_key and environment and I can list_indexes() and describe_index() the index that exists on pinecone, but when I try to use vector_store_index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, service_contenxt=service_context) I always get the errors: ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host, ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)), PineconeProtocolError: Failed to connect; did you specify the correct index name?
active_indexes = pinecone.list_indexes() -> ['report-vector-store'] pinecone.describe_index("report-vector-store") ->IndexDescription(name='report-vector-store', metric='cosine', replicas=1, dimension=1536.0, shards=1, pods=1, pod_type='starter', status={'ready': True, 'state': 'Ready'}, metadata_config=None, source_collection='')
L
t
48 comments
I had this working just the other day, so I can confirm it should it work haha
yeah thats what I'm referencing! the only difference might be my storage_context? I decided to try and use mongo for my docstore. so the storage_context(docstore=MongoDocumentStore(), and vector_store=PineconeVectorStore, index_store=SimpleIndexStore())
The error says you didn't specify the index name correctly πŸ€”
yeah it is
so I removed the docstore from the storage_context and it worked
maybe pinecone and mongodb are using the same port to communicate or something wacky? unsure
is it possible to use mongo for docstore and pinecone for vector_store?
the example to get pinecone working didn't include any reference to using a docstore or index_store, which didn't make sense to me given we need to use all 3, or so I thought?
does it make sense to maintain 2 separate storage_contexts if mongo and pinecone don't play well together?
When you use a vectordb, the docstore and index store are unused, unless you set store_nodes_override=True

This is because it stores all the nodes in pinecone itself, theres technically little reason to have a docstore
oh, I thought storing documents was not guaranteed across all vector stores. also, i thought the index_store was used to track all the llama indexes you create, like if I create a vectorstoreindex and summaryindex and a tree-index, those all get registered in the index_store?
or what if I create multiple vectorstores?
sorry I'm still trying to figure this all out ❀️
When using a vector index like pinecone, you would create an index for each actual vector index you have. Most vectordbs have some similar concept as well (collection names, namespaces, index names, table names)
sorry, does that mean I need a unique storage_context for each vector_store_index I instantiate?
yea essentially, at least for vector db integrations
otherwise it will insert into the same vector index (which you probably don't want)
Just a symptom of how the db integrations work πŸ€”
Just a quick follow-up, if we pass a SentenceWindowNodeParser to the service_context and pass that to a VectorStoreIndex.from_documents(docs, service_context=ctx) backed by Pinecone, does the SentenceWindowNodeParser get used to chunk the documents or does Pinecone do its own chunking? I'm looking at the nodes in Pinecone of the two docs I passed in and there is neither the "window" nor "original_text" metadata keys. I'm wondering if I should make the nodes with the SentenceWindowNodeParser and just pass the nodes into Pinecone?
LlamaIndex will handle the chunking, always πŸ™‚

It's in there, but it will probably be in the _node_content field. At least i hope it is, it should be lol

Can also create the nodes and do VectorStoreIndex(nodes, ...) to pass in the nodes directly
how do you fetch the nodes/documents from pinecone to inspect them? I tried manually pinecone_index.fetch(["4f94d420-32a7-43b2-9d7c-976a7c9ca3c7"]) and that printed out the node, butneither "window" nor "original_text" were there. I'll try another test, maybe the sentenceWindowNodeParser wasn't set correctly when I ran it the first time.
Try something like

Plain Text
retriever = index.as_retriever()
nodes = retriever.retrieve("test")
ok.
so I redid my tests with adding documents and passing a service_context with the SentenceWindowNodeParser defined.
I created the parser, created the service_context, created the storage_context and then created all the pinecone stuff and ran VectorStoreIndex.from_documents(docs, storage_context=.., service_context=...) and neither the "window" nor "original_text" metadata keys appear in the node contents. I'm not sure how to add nodes directly, when I tried .from_documents() with nodes I get an error about no get_doc_id.
At first glance, when I use the SWNP, I get 93 nodes, in pinecone there are only 19 nodes.
sentence_window_parser = SentenceWindowNodeParser( window_size=5, window_metadata_key="window", original_text_metadata_key="original_text", include_metadata=True, include_prev_next_rel=True, ) service_context_sentence = ServiceContext.from_defaults(llm=llm, embed_model=OpenAIEmbedding(embed_batch_size=50), node_parser=sentence_window_parser) pinecone_index = pinecone.Index('report-vector-store') storage_context = StorageContext.from_defaults(vector_store=vector_store) vector_store_index_a = VectorStoreIndex.from_documents(a_diff_pair_docs_a, storage_context=storage_context, service_contenxt=service_context_sentence)
index.insert(document) will insert a single document (or node, same thing really)
let me try to replicate your results
You missed a step
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
or maybe you didn't copy that
either way testing now -- seems to be inserting 757 nodes from the PG essay lol (which tells me it works, usually its only 10 or so nodes)
Worked for me πŸ˜…
OK, thanks kindly for verifying. I didn't forget the vector_store line, I forgot to paste it. I see you have a paid tier with pinecone, I'm on the free tier. - there are a lot of functions that are restricted on the free tier. I wonder if that affects the process? I tried to pass nodes from SWNP and it caused an error.
does the .refresh_ref_docs() method work for Pinecone indexes?
I just did your exact steps and still no "window" in metadata. le sigh...
Oh I used the free tier haha
how? namespaces are not available in free tier
Plain Text
pinecone.init(api_key=api_key, environment="asia-southeast1-gcp-free")
pinecone.create_index("testing", dimension=1536, metric="euclidean", pod_type="p1")
idk, it just worked for me?
πŸ€·β€β™‚οΈ
They definitely do not have my credit card lol
can you paste your sentencewindownodeparser/service_context lines?
Plain Text
pinecone.init(api_key=api_key, environment="asia-southeast1-gcp-free")
pinecone.create_index("testing", dimension=1536, metric="euclidean", pod_type="p1")
pinecone_index = pinecone.Index("testing")

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import PineconeVectorStore

# load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

from llama_index.node_parser import SentenceWindowNodeParser
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(node_parser=SentenceWindowNodeParser())

# initialize without metadata filter
from llama_index.storage.storage_context import StorageContext

vector_store = PineconeVectorStore(pinecone_index=pinecone_index, namespace='test1')
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)

response = index.as_query_engine().query("what did the author do growing up?")
print(response.source_nodes[0].node.metadata)
Add a reply
Sign up and join the conversation on Discord