Is there any advice on good ways to

At a glance

The community member is seeking advice on composing indices for their document collection, which consists of about 200 legal cases and 200 scholarly articles, each around 20 pages long on average. They are considering wrapping each document in a list index, then putting the documents into two separate vector or tree indices (one for each document type), and finally adding a list index on top of the two indices.

The comments suggest that this approach makes sense, as long as the community member is aware of potential speed and cost issues due to the increased complexity. One community member notes that the documentation shows an example of a list index on top of a tree index, but cautions that this is mostly for illustrative purposes.

The community members discuss the trade-offs between putting all the documents in a single vector index versus using separate indices for the two document types. They also suggest that wrapping each document in a list index and using the "tree_summarize" response mode could potentially improve performance, although cost is a consideration.

Useful resources

ffoggyeyes

Is there any advice on good ways to compose indices? The documentation shows a list index on top of tree indices, but not sure if that's the best way. In my case, I have two types of documents. I have about 200 documents of each type, about 20 pages each on average. I was thinking of wrapping each document in it's own list index to force the model to use the entire document. Then, I'd probably put all the documents into two vector or tree indices (one for each type), and then a list index on top of the two vector/tree indices to force the model to look at both types of documents. Any thoughts on better ways to compose would be greatly appreciated.

13 comments

jjerryjliu0

hey out of curiosity which documentation shows composing a list index on top of tree indices?

jjerryjliu0

for your use case a list index per document makes sense (as long as each doc isn't too long!). and then a vector index on top of the list indices makes sense too

jjerryjliu0

what are the type sof documents that you have?

jjerryjliu0

what you're describing makes sense at a high-level - i would just watch out for speed/cost, because the more connections you make, the larger the latency

ffoggyeyes

This page shows a list on top of a tree

ffoggyeyes

https://gpt-index.readthedocs.io/en/latest/how_to/composability.html

ffoggyeyes

Attachment

ffoggyeyes

The documents are legal cases, and scholarly articles

ffoggyeyes

That's why I want to separate them into two vector indices

jjerryjliu0

ahhh got it i see. yeah that's mostly for illustrative purposes

jjerryjliu0

makes sense! if you don't care about whether we only synthesize information from one source or multiple, put them into one vector index. if you explicitly want to synthesize information across indices, list index on top of two vector indices makes sense

ffoggyeyes

What do you think about wrapping each document in a list index? Cost issues aside, would this improve performance?

jjerryjliu0

i think so! you can try setting response_mode="tree_summarize" when querying the list index - it'll explicitly go through every node

Add a reply

Find answers from the community

Is there any advice on good ways to