Find answers from the community

Updated 2 years ago

Graph building

Hi Friends,

I have been following this notebook on how to integrate Pinecone.
https://github.com/jerryjliu/llama_index/blob/main/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb
In this tutorial, specifically section titled "Build Graph: Keyword Table Index on top of vector indices!", each individual document has the GPTPineconeIndex structure; then on top of all these documents, a GPTSimpleKeywordTableIndex is defined as part of a ComposableGraph.

My question is, would it be possible for me to do things the other way around? By that I mean, making each individual document a GPTListIndex, then define a GPTPineconeIndex on top of all these GPTListIndex documents as part of a ComposableGraph.

The reason why I want to do this is because I have tons of documents and most of them are really really long. So I don't want GPTPineconeIndex to chop up these documents and store them on Pinecone. Instead, I want to keep these documents in GPTListIndex, so they don't get read into until query time if necessary. Then for the GPTPineconeIndex ComposableGraph, I can use the document summaries, this way hopefully Pinecone will only have to look through the synopsis of these books first before deciding whether or not to query further into them.

Not sure if my question is too wordy, but any help would be greatly appreciated.
Have a good weekend everyone πŸ™‚
L
H
24 comments
Conceptually I think it should work!

For example, I've definitely seen the simple vector index used at the top level

Give it a try and see how it goes. πŸ’ͺ

You can pass in the pinecone index client when calling from_indices
Hi @Logan M ,
Thank you for your prompt reply. I just tried it but got an error, not sure where I am going wrong.
Error: ValueError Traceback (most recent call last)
<ipython-input-7-821e89020df0> in <cell line: 16>()
15
16 graph_RMS = ComposableGraph.from_indices(
---> 17 GPTPineconeIndex(index_name = index_name, environment = environment),
18 [index_set["doc{}".format(str(num))] for num in range(1, 6)],
19 index_summaries=all_titles[:5],

2 frames
/usr/local/lib/python3.9/dist-packages/llama_index/indices/base.py in init(self, nodes, index_struct, docstore, service_context)
48 """Initialize with parameters."""
49 if index_struct is None and nodes is None:
---> 50 raise ValueError("One of documents or index_struct must be provided.")
51 if index_struct is not None and nodes is not None:
52 raise ValueError("Only one of documents or index_struct can be provided.")

ValueError: One of documents or index_struct must be provided.
Oh, in the from indices function, don't instantiate the GPTPineconeIndex, just pass the class

But in the kwargs, still pass the name and environment that you would use to instantiate the index
from_indices(GPTPineconeIndex, ...., index_name=index_name, environment=environment)
ohhhhh makes sense! thanks so much πŸ™‚
Hope it works! Haha πŸ™
It works like a charm!
Thanks again!
@Logan M If you're still available...
I am still having trouble loading the graph back in,

graph_RMS = ComposableGraph.load_from_disk(
'graph_RMS.json',
service_context=service_context_RMS, index_name = index_name, environment = pinecone_environment)

gives me " Must specify index_name and environment if not directly passing in client."
Hmmm... do you have the full stack trace?
Do you mean this?
"
ValueError Traceback (most recent call last)
<ipython-input-49-d9b8939200b6> in <cell line: 4>()
2
3 # [optional] load from disk, so you don't need to build graph from scratch
----> 4 graph_RMS = ComposableGraph.load_from_disk(
5 'graph_RMS.json',
6 service_context=service_context_RMS, index_name = index_name, environment = pinecone_environment)

5 frames
/usr/local/lib/python3.9/dist-packages/llama_index/vector_stores/pinecone.py in init(self, pinecone_index, index_name, environment, namespace, metadata_filters, pinecone_kwargs, insert_kwargs, query_kwargs, delete_kwargs, add_sparse_vector, tokenizer, **kwargs)
166 )
167 if index_name is None or environment is None:
--> 168 raise ValueError(
169 "Must specify index_name and environment "
170 "if not directly passing in client."

ValueError: Must specify index_name and environment if not directly passing in client.
"
yes haha but it seems like it condensed it, no worries

I'll take a peek at the source code... not sure why that's not getting passed down
In the tutorial, https://github.com/jerryjliu/llama_index/blob/main/examples/composable_indices/city_analysis/PineconeDemo-CityAnalysis.ipynb
there is this block,
I suspect this may have something to do with it?
Attachment
image.png
Ah right right, the index kwargs
Im still stuck though, because I defined GPTPineconeIndex as a ComposableGraph on top of a list of GPTListIndex, so I only have a ComposableGraph object, so how can I get index objects from it like in the tutorial from "city_indices.values()"
I think you can get the index ID and create the same type of dictionary using graph._index_struct.root_id
(Sorry, haven't quite tried this myself yet, so just reading source code haha)
Absolutely no need to apologize, I'm extremely grateful for your help
I tried this, not working still, will try the root_id thing you just mentioned
Attachment
image.png
Now it works! In case anyone comes across this in the future, this is what is needed:

query_context_kwargs = {graph._index_struct.root_id:{
'vector_store': {
"pinecone_index":pinecone_index
}
}}
Glad you got it, nice!
Add a reply
Sign up and join the conversation on Discord