Find answers from the community

Updated 2 years ago

Following this development with interest

At a glance

Following this development with interest (but still short on some foundational understanding): If a user has already chunked their data into <4K token chunks and performed embeddings using the OpenAI embeddings API, and subsequently stored these into something like Pinecone for semantic search match, how does GPT_Index add functionality on top of this?

5 comments

jjerryjliu0

Yeah we have a Pinecone reader for you to load embeddings + corresponding docs into GPT Index. One use case could be if you wanted to retrieve multiple docs instead of just one doc, GPT Index allows you to put all these docs in an index to query for later, without you having to worry about whether the docs all fit in the prompt. e.g. https://github.com/jerryjliu/gpt_index/blob/main/examples/data_connectors/PineconeDemo.ipynb

ooskar

Thanks! Add this to the "short on foundational knowledge" section: so will GPT Index automate the querying of multiple documents (whose combined token count > 4K)? If so, by multiple independent calls to GPT?

jjerryjliu0

Yep!

ooskar

So in essence then a store like Pinecone is really primarily just used as a pre-filter to speed up/cut down costs?

jjerryjliu0

Yeah that's one part of it - you can also use a vector store as the underlying store of an index. I don't have a Pinecone index yet but I do have a GPTFaissIndex and GPTWeaviateIndex. Here you can load in your documents from anywhere, and you load it into the index - the storage of the index is backed by the underlying vector store https://gpt-index.readthedocs.io/en/latest/how_to/vector_stores.html

Add a reply