Han Liu

Pinecone

Hi

I wonder if you have some free time and could enlighten me on the inner workings of GPTPineconeIndex...

It is my understanding that every time I call GPTPineconeIndex.from_documents(service_context being gpt-3.5-turbo*), a new vector embedding from gpt-3.5-turbo will be added to the specified Pinecone database. But current OpenAI doesn't support gpt-3.5-turbo embedding in the sense that you can't call their API and pass in a text string then get its embedding back for gpt-3.5-turbo. So how is it possible for Llama-index to do so?
When I call GPTPineconeIndex.from_documents, where is the original raw text stored after this? Is it passed to Pinecone as part of the meta data? Or is it stored somewhere locally along with some sort of mapping linking the original raw text to its corresponding vector on Pinecone?
Let's say I called GPTPineconeIndex.from_documents on one million documents, but I didn't save all these indices locally and turned off my desktop. Is there a way for me to still do index.query("blablabla") later since technically everything is still on Pinecone?

Any help and guidance would be greatly appreciated 🙂
Thanks in advance.

3 comments

Hi everyone,
I passed a LLMPredictor() in the service_context when instantiating a ComposableGraph. But no matter how many times I query using that graph, the "last_token_usage" of the LLMPredictor remains 0. Why is this? Is there a better way that I can keep track of token usage?

8 comments

HHan Liu

Graph building

Hi Friends,

I have been following this notebook on how to integrate Pinecone.
https://github.com/jerryjliu/llama_index/blob/main/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb
In this tutorial, specifically section titled "Build Graph: Keyword Table Index on top of vector indices!", each individual document has the GPTPineconeIndex structure; then on top of all these documents, a GPTSimpleKeywordTableIndex is defined as part of a ComposableGraph.

My question is, would it be possible for me to do things the other way around? By that I mean, making each individual document a GPTListIndex, then define a GPTPineconeIndex on top of all these GPTListIndex documents as part of a ComposableGraph.

The reason why I want to do this is because I have tons of documents and most of them are really really long. So I don't want GPTPineconeIndex to chop up these documents and store them on Pinecone. Instead, I want to keep these documents in GPTListIndex, so they don't get read into until query time if necessary. Then for the GPTPineconeIndex ComposableGraph, I can use the document summaries, this way hopefully Pinecone will only have to look through the synopsis of these books first before deciding whether or not to query further into them.

Not sure if my question is too wordy, but any help would be greatly appreciated.
Have a good weekend everyone 🙂

24 comments

Find answers from the community

Pinecone

Hi everyone

Graph building