Find answers from the community

Updated 2 years ago

I am having problems with using

At a glance

The community member is having issues with using GPTPineconeIndex to index their data, which consists of around 6900 articles taking up 2GB of space. They have tried indexing the data multiple times, but it keeps throwing connection errors, with the longest successful indexing taking around 2 hours.

In the comments, another community member suggests trying to insert the documents one by one, as that may be more reliable than rebuilding the entire index. They also mention that the community member could try using a different vector store, as they understand that vector stores are more memory-efficient than loading a 2GB file.

The community member asks if they would have to manually insert all 6900 articles one by one, and another community member confirms that this is the case at the moment, unless they use the async functionality, which is not yet available for the insert operation.

I am having problems with using GPTPineconeIndex. I have been trying 3 times for it to index all my data but at some point just throws some connections errors. Pretty annoying as it takes like 5 hours to get through all of my index and the longest I have been able to get it to index is like 2 hours.

Code:
pinecone.init(api_key=pinecone_api_key, environment="us-east1-gcp")
pinecone.create_index("wed-match", dimension=1536, metric="euclidean", pod_type="p1")
index = pinecone.Index("wed-match")

documents = SimpleDirectoryReader('api\data').load_data()
INDEX = GPTPineconeIndex(documents, pinecone_index=index, chunk_size_limit=512)

Errors:
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

raise PineconeProtocolError(f'Failed to connect; did you specify the correct index name?') from e
pinecone.core.exceptions.PineconeProtocolError: Failed to connect; did you specify the correct index name?
j
E
7 comments
hey @Erik , thanks for raising. Out of curiosity how large is your dataset?
one thing you could try is using our insert call to insert Documents one at a time
that way if it fails you can always retry on a single insert instead of rebuilding the index
@jerryjliu0 I have around 6900 articles. When I used simple vector index and saved it as json it was 2GB.
Could I have better luck with other vector stores maybe? I am not set on Pinecone. But I would like to move the data to vector store as I understand this is way more memory efficient than loading in a 2GB file in my application.
Does that mean if I have 6900 articles, I would have to manually insert them one by one?
@Erik yeah at the moment. tbh that's how the build_index_from_documentsworks unless you're using our async functionality (we don't have async for insert yet but can add!)
Add a reply
Sign up and join the conversation on Discord