Is the recommended approach for using

At a glance

The community members are discussing the recommended approach for using Pinecone (or another vector store) with the gpt_index library. They are trying to decide whether to load documents directly into Pinecone or use the gpt_index abstraction. The discussion also covers how to interface with gpt_index when working with a pre-existing vector index in Pinecone, and whether to serialize the index by saving to and loading from disk.

The community members have tried using the GPTPineconeIndex and have been able to query against it. However, they are unsure if this approach preserves the original documents, as Pinecone may not appreciate having the full document attached in the metadata. They are also wondering if gpt_index inserts document chunks into the metadata or just stores the documents in-memory locally.

The community members are trying to decide whether to continue their current workflow of loading pre-made indices from disk via the index.load_from_disk() method, or if attaching to the pre-existing Pinecone index (that was populated with gpt_index) is sufficient.

Useful resources

yyourbuddyconner

Is the recommended approach for using Pinecone (or another vector store) to load the documents into the store with gpt_index as the interface? (i.e. fresh index, create documents, insert into GPTPineconeIndex)

How would one interface with gpt_index in the case of a pre-existing vector index in pinecone in this case?

Trying to decide if the flexibility of being able to interface directly is better than using the gpt_index abstraction or not for non-full-document Q/A (ex. storing previous queries for a cache of Q/A so LLM calls can be limited)

9 comments

yyourbuddyconner

Followup question:

Assuming the best interface is GPTPineconeIndex, is the best way to serialize the index saving to disk and loading from disk at boot? (i.e. in an API)

I kind of like the fact gpt_index does the job of mapping documents/nodes to vectors in the store so I don't have to keep track of a mapping but not sure of the trade-off space right now.

00x32e

Have you tried this? https://discord.com/channels/1059199217496772688/1059201661417037995/1067531191680503890

yyourbuddyconner

Hey neat I missed that message

yyourbuddyconner

Do you know if this preserves the original documents? AFAIK pinecone doesn't appreciate having the full document attached in metadata

yyourbuddyconner

(also cannot wait until someone gets around to a Q/A bot for this server lol)

00x32e

I tried it last night with a sample index and I was able to query against it. Not sure what you meant by "preserves the original documents" - You should be able to store text chunks inside metadata. (Be aware of the chunk size limit)

yyourbuddyconner

Yeah, I guess I was wondering if gpt_index inserts document chunks into the metadata or just stores the Documents in-memory locally

yyourbuddyconner

I can read the code to figure this out though if you dont know

yyourbuddyconner

What I am getting at is, right now I am loading pre-made indices from disk via the index.load_from_disk() method, I am wondering if I have to preserve that workflow or if attaching to the pre-existing pinecone index (that was populated with gpt_index) is enough

Add a reply

Find answers from the community

Is the recommended approach for using