The community member is having issues with using GPTPineconeIndex to index their data, which consists of around 6900 articles taking up 2GB of space. They have tried indexing the data multiple times, but it keeps throwing connection errors, with the longest successful indexing taking around 2 hours.
In the comments, another community member suggests trying to insert the documents one by one, as that may be more reliable than rebuilding the entire index. They also mention that the community member could try using a different vector store, as they understand that vector stores are more memory-efficient than loading a 2GB file.
The community member asks if they would have to manually insert all 6900 articles one by one, and another community member confirms that this is the case at the moment, unless they use the async functionality, which is not yet available for the insert operation.
I am having problems with using GPTPineconeIndex. I have been trying 3 times for it to index all my data but at some point just throws some connections errors. Pretty annoying as it takes like 5 hours to get through all of my index and the longest I have been able to get it to index is like 2 hours.
Code: pinecone.init(api_key=pinecone_api_key, environment="us-east1-gcp") pinecone.create_index("wed-match", dimension=1536,metric="euclidean", pod_type="p1") index = pinecone.Index("wed-match")
documents = SimpleDirectoryReader('api\data').load_data() INDEX = GPTPineconeIndex(documents, pinecone_index=index, chunk_size_limit=512)
Errors: raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response
raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
raise PineconeProtocolError(f'Failed to connect; did you specify the correct index name?') from e pinecone.core.exceptions.PineconeProtocolError: Failed to connect; did you specify the correct index name?
Could I have better luck with other vector stores maybe? I am not set on Pinecone. But I would like to move the data to vector store as I understand this is way more memory efficient than loading in a 2GB file in my application.
@Erik yeah at the moment. tbh that's how the build_index_from_documentsworks unless you're using our async functionality (we don't have async for insert yet but can add!)