yep! the index kind of depends what questions you want to ask. do you want to ask questions about specific facts? then use a vector index. do you want to perform summarization? then try using a list index
what if you wanted to support both? I'm playing with a collection of documents as separate vector indexes, and then wrapping that all together in a list index.
You could do some sort of pre-processing on the query using embeddings to detect if the query is asking for a summary or not lol
Another (slower) option is straight-up manually asking the LLM too
once you know if the user wants a query, you can call the appropriate indexes/functions
well that's where I was going. Do we first try to understand intent, and then try to route to the correct index to use?
That's my thinking π€
Actually, If you put a vector/keyword index at the top-level with two sub indexes (one for summaries, like a list index, and the other for normal queries, like an existing composable index), and then with summaries like "This index will provide a summary", it might route it properly. That starts to get a little complex though
I wish I had a better intuition on how to combine indexes together, to support multiple use cases like summarize and ask a specific question. Take a book as an example, with 5 chapters. I could see one wanting to support an answer to a specific question, and perhaps a summary. What index design structure best supports that?
If you need to handle both queries, I think things will get a little complicated
Here's a rough idea off the top of my head lol I think the response time might be a little slow though, not sure. Probably depends on the number of chapters
Thank you for that! I appreciate you taking the time to draw that up. Lot's to play with. For some things, just simple vector index on the entire book is enough! I also so a use case where they make each document a tree index. When is it best to use that?
Always easier to communicate with an image haha
Trees are a good replacement for list indexes imo. They provide good summarization abilities.
I've also seen some examples with them at the top level of a graph but I haven't played around with it too much yet
I was actually playing with tree at the top level as well, ok results. But testing is stalled .. Open AI not behaving
I guess that's where the magic happens .. how you index the data .. and like any index you create, it should support a specific use case