- Ah, maybe I oversold it a bit π When I said chunks, I mean each input document is split into chunks according to either how much text can be sent to the LLM at once, or you can pre-define the chunk size.
So in your example, it would hopefully match to the text chunk containing information on containers. The key cost-savings part being that the LLM didn't have to read all your documentation, just the part that's relevant.
- Yea, that might work! For this example, it would really depend on how you structure your index
From what you are describing, you might get the best results using a vector index for each page, and then wrapping all those vector indexes with either a keyword index or another vector index. Or maybe even some other combination.
By stacking indexes, it tries to ensure that queries get routed to only relevant information. Check out the docs for this here:
https://gpt-index.readthedocs.io/en/latest/how_to/composability.html(In fact, the llama index docs contain a lot of good information, you might find it helpful πͺ )