Hi guys, could somebody enlighten me with RAG context size? If models have a context length, then the additional context we retrieve has to fit into this context size or it's different? I guess that'd affect the number and size of the chunks retrieved. Thanks!
LlamaIndex query engines also ensure that if you retrieve more context than will fit into the window window, several llm calls are made to refine an answer, so that the llm ends up reading all the text