Following this development with interest (but still short on some foundational understanding): If a user has already chunked their data into <4K token chunks and performed embeddings using the OpenAI embeddings API, and subsequently stored these into something like Pinecone for semantic search match, how does GPT_Index add functionality on top of this?
Yeah we have a Pinecone reader for you to load embeddings + corresponding docs into GPT Index. One use case could be if you wanted to retrieve multiple docs instead of just one doc, GPT Index allows you to put all these docs in an index to query for later, without you having to worry about whether the docs all fit in the prompt. e.g. https://github.com/jerryjliu/gpt_index/blob/main/examples/data_connectors/PineconeDemo.ipynb
Thanks! Add this to the "short on foundational knowledge" section: so will GPT Index automate the querying of multiple documents (whose combined token count > 4K)? If so, by multiple independent calls to GPT?
Yeah that's one part of it - you can also use a vector store as the underlying store of an index. I don't have a Pinecone index yet but I do have a GPTFaissIndex and GPTWeaviateIndex. Here you can load in your documents from anywhere, and you load it into the index - the storage of the index is backed by the underlying vector store https://gpt-index.readthedocs.io/en/latest/how_to/vector_stores.html