I am having problems with using

At a glance

The community member is having issues with using GPTPineconeIndex to index their data, which consists of around 6900 articles taking up 2GB of space. They have tried indexing the data multiple times, but it keeps throwing connection errors, with the longest successful indexing taking around 2 hours.

In the comments, another community member suggests trying to insert the documents one by one, as that may be more reliable than rebuilding the entire index. They also mention that the community member could try using a different vector store, as they understand that vector stores are more memory-efficient than loading a 2GB file.

The community member asks if they would have to manually insert all 6900 articles one by one, and another community member confirms that this is the case at the moment, unless they use the async functionality, which is not yet available for the insert operation.

EErik

I am having problems with using GPTPineconeIndex. I have been trying 3 times for it to index all my data but at some point just throws some connections errors. Pretty annoying as it takes like 5 hours to get through all of my index and the longest I have been able to get it to index is like 2 hours.

Code:
pinecone.init(api_key=pinecone_api_key, environment="us-east1-gcp")
pinecone.create_index("wed-match", dimension=1536, metric="euclidean", pod_type="p1")
index = pinecone.Index("wed-match")

documents = SimpleDirectoryReader('api\data').load_data()
INDEX = GPTPineconeIndex(documents, pinecone_index=index, chunk_size_limit=512)

Errors:
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

raise PineconeProtocolError(f'Failed to connect; did you specify the correct index name?') from e
pinecone.core.exceptions.PineconeProtocolError: Failed to connect; did you specify the correct index name?

7 comments

jjerryjliu0

hey @Erik , thanks for raising. Out of curiosity how large is your dataset?

jjerryjliu0

one thing you could try is using our insert call to insert Documents one at a time

jjerryjliu0

that way if it fails you can always retry on a single insert instead of rebuilding the index

EErik

@jerryjliu0 I have around 6900 articles. When I used simple vector index and saved it as json it was 2GB.

EErik

Could I have better luck with other vector stores maybe? I am not set on Pinecone. But I would like to move the data to vector store as I understand this is way more memory efficient than loading in a 2GB file in my application.

EErik

Does that mean if I have 6900 articles, I would have to manually insert them one by one?

jjerryjliu0

@Erik yeah at the moment. tbh that's how the build_index_from_documentsworks unless you're using our async functionality (we don't have async for insert yet but can add!)

Add a reply

Find answers from the community

I am having problems with using