hi llamatonians anyone got

At a glance

hi llamatonians, anyone got pineconevectorstore working? I ran the init() with my api_key and environment and I can list_indexes() and describe_index() the index that exists on pinecone, but when I try to use

vector_store_index = VectorStoreIndex.from_documents(docs, storage_context=storage_context, service_contenxt=service_context)

I always get the errors:

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host, ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)), PineconeProtocolError: Failed to connect; did you specify the correct index name?

active_indexes = pinecone.list_indexes() -> ['report-vector-store']
pinecone.describe_index("report-vector-store") ->IndexDescription(name='report-vector-store', metric='cosine', replicas=1, dimension=1536.0, shards=1, pods=1, pod_type='starter', status={'ready': True, 'state': 'Ready'}, metadata_config=None, source_collection='')

48 comments

LLogan M

Did you follow the guide? https://gpt-index.readthedocs.io/en/stable/examples/vector_stores/PineconeIndexDemo.html

LLogan M

I had this working just the other day, so I can confirm it should it work haha

ttheta

yeah thats what I'm referencing! the only difference might be my storage_context? I decided to try and use mongo for my docstore. so the storage_context(docstore=MongoDocumentStore(), and vector_store=PineconeVectorStore, index_store=SimpleIndexStore())

LLogan M

The error says you didn't specify the index name correctly 🤔

ttheta

yeah it is

ttheta

so I removed the docstore from the storage_context and it worked

LLogan M

oh lol

LLogan M

maybe pinecone and mongodb are using the same port to communicate or something wacky? unsure

ttheta

is it possible to use mongo for docstore and pinecone for vector_store?

ttheta

the example to get pinecone working didn't include any reference to using a docstore or index_store, which didn't make sense to me given we need to use all 3, or so I thought?

ttheta

does it make sense to maintain 2 separate storage_contexts if mongo and pinecone don't play well together?

LLogan M

When you use a vectordb, the docstore and index store are unused, unless you set store_nodes_override=True

This is because it stores all the nodes in pinecone itself, theres technically little reason to have a docstore

ttheta

oh, I thought storing documents was not guaranteed across all vector stores. also, i thought the index_store was used to track all the llama indexes you create, like if I create a vectorstoreindex and summaryindex and a tree-index, those all get registered in the index_store?

ttheta

or what if I create multiple vectorstores?

ttheta

sorry I'm still trying to figure this all out ❤️

LLogan M

When using a vector index like pinecone, you would create an index for each actual vector index you have. Most vectordbs have some similar concept as well (collection names, namespaces, index names, table names)

ttheta

sorry, does that mean I need a unique storage_context for each vector_store_index I instantiate?

LLogan M

yea essentially, at least for vector db integrations

LLogan M

otherwise it will insert into the same vector index (which you probably don't want)

LLogan M

Just a symptom of how the db integrations work 🤔

ttheta

Just a quick follow-up, if we pass a SentenceWindowNodeParser to the service_context and pass that to a VectorStoreIndex.from_documents(docs, service_context=ctx) backed by Pinecone, does the SentenceWindowNodeParser get used to chunk the documents or does Pinecone do its own chunking? I'm looking at the nodes in Pinecone of the two docs I passed in and there is neither the "window" nor "original_text" metadata keys. I'm wondering if I should make the nodes with the SentenceWindowNodeParser and just pass the nodes into Pinecone?

LLogan M

LlamaIndex will handle the chunking, always 🙂

It's in there, but it will probably be in the _node_content field. At least i hope it is, it should be lol

Can also create the nodes and do VectorStoreIndex(nodes, ...) to pass in the nodes directly

ttheta

how do you fetch the nodes/documents from pinecone to inspect them? I tried manually pinecone_index.fetch(["4f94d420-32a7-43b2-9d7c-976a7c9ca3c7"]) and that printed out the node, butneither "window" nor "original_text" were there. I'll try another test, maybe the sentenceWindowNodeParser wasn't set correctly when I ran it the first time.

LLogan M

Try something like

Plain Text

retriever = index.as_retriever()
nodes = retriever.retrieve("test")

ttheta

ok.
so I redid my tests with adding documents and passing a service_context with the SentenceWindowNodeParser defined.
I created the parser, created the service_context, created the storage_context and then created all the pinecone stuff and ran VectorStoreIndex.from_documents(docs, storage_context=.., service_context=...) and neither the "window" nor "original_text" metadata keys appear in the node contents. I'm not sure how to add nodes directly, when I tried .from_documents() with nodes I get an error about no get_doc_id.
At first glance, when I use the SWNP, I get 93 nodes, in pinecone there are only 19 nodes.


sentence_window_parser = SentenceWindowNodeParser(
        window_size=5,
        window_metadata_key="window",
        original_text_metadata_key="original_text",
        include_metadata=True,
        include_prev_next_rel=True,
    )
service_context_sentence = ServiceContext.from_defaults(llm=llm, embed_model=OpenAIEmbedding(embed_batch_size=50), node_parser=sentence_window_parser)
pinecone_index = pinecone.Index('report-vector-store')
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_store_index_a = VectorStoreIndex.from_documents(a_diff_pair_docs_a, 
                                                     storage_context=storage_context, 
                                                     service_contenxt=service_context_sentence)

LLogan M

index.insert(document) will insert a single document (or node, same thing really)

LLogan M

let me try to replicate your results

LLogan M

OH!

LLogan M

You missed a step

LLogan M

vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

LLogan M

or maybe you didn't copy that

LLogan M

either way testing now -- seems to be inserting 757 nodes from the PG essay lol (which tells me it works, usually its only 10 or so nodes)

LLogan M

Attachment

LLogan M

Worked for me 😅

ttheta

OK, thanks kindly for verifying. I didn't forget the vector_store line, I forgot to paste it. I see you have a paid tier with pinecone, I'm on the free tier. - there are a lot of functions that are restricted on the free tier. I wonder if that affects the process? I tried to pass nodes from SWNP and it caused an error.

ttheta

does the .refresh_ref_docs() method work for Pinecone indexes?

ttheta

I just did your exact steps and still no "window" in metadata. le sigh...

LLogan M

negatory

LLogan M

Oh I used the free tier haha

ttheta

how? namespaces are not available in free tier

LLogan M

Plain Text

pinecone.init(api_key=api_key, environment="asia-southeast1-gcp-free")
pinecone.create_index("testing", dimension=1536, metric="euclidean", pod_type="p1")

LLogan M

idk, it just worked for me?

LLogan M

🤷‍♂️

ttheta

LOL

LLogan M

They definitely do not have my credit card lol

ttheta

can you paste your sentencewindownodeparser/service_context lines?

LLogan M

Plain Text

pinecone.init(api_key=api_key, environment="asia-southeast1-gcp-free")
pinecone.create_index("testing", dimension=1536, metric="euclidean", pod_type="p1")
pinecone_index = pinecone.Index("testing")

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores import PineconeVectorStore

# load documents
documents = SimpleDirectoryReader("../data/paul_graham").load_data()

from llama_index.node_parser import SentenceWindowNodeParser
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(node_parser=SentenceWindowNodeParser())

# initialize without metadata filter
from llama_index.storage.storage_context import StorageContext

vector_store = PineconeVectorStore(pinecone_index=pinecone_index, namespace='test1')
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)

response = index.as_query_engine().query("what did the author do growing up?")
print(response.source_nodes[0].node.metadata)

LLogan M

full thing

Add a reply

Find answers from the community

hi llamatonians anyone got