The community members are discussing the recommended approach for using Pinecone (or another vector store) with the gpt_index library. They are trying to decide whether to load documents directly into Pinecone or use the gpt_index abstraction. The discussion also covers how to interface with gpt_index when working with a pre-existing vector index in Pinecone, and whether to serialize the index by saving to and loading from disk.
The community members have tried using the GPTPineconeIndex and have been able to query against it. However, they are unsure if this approach preserves the original documents, as Pinecone may not appreciate having the full document attached in the metadata. They are also wondering if gpt_index inserts document chunks into the metadata or just stores the documents in-memory locally.
The community members are trying to decide whether to continue their current workflow of loading pre-made indices from disk via the index.load_from_disk() method, or if attaching to the pre-existing Pinecone index (that was populated with gpt_index) is sufficient.
Is the recommended approach for using Pinecone (or another vector store) to load the documents into the store with gpt_index as the interface? (i.e. fresh index, create documents, insert into GPTPineconeIndex)
How would one interface with gpt_index in the case of a pre-existing vector index in pinecone in this case?
Trying to decide if the flexibility of being able to interface directly is better than using the gpt_index abstraction or not for non-full-document Q/A (ex. storing previous queries for a cache of Q/A so LLM calls can be limited)
Assuming the best interface is GPTPineconeIndex, is the best way to serialize the index saving to disk and loading from disk at boot? (i.e. in an API)
I kind of like the fact gpt_index does the job of mapping documents/nodes to vectors in the store so I don't have to keep track of a mapping but not sure of the trade-off space right now.
I tried it last night with a sample index and I was able to query against it. Not sure what you meant by "preserves the original documents" - You should be able to store text chunks inside metadata. (Be aware of the chunk size limit)
What I am getting at is, right now I am loading pre-made indices from disk via the index.load_from_disk() method, I am wondering if I have to preserve that workflow or if attaching to the pre-existing pinecone index (that was populated with gpt_index) is enough