The community members are discussing the impact of using a model with a higher context length (128k) versus a lower context length (16k) when doing Retrieval Augmented Generation (RAG). They are wondering if a higher context length model will have larger chunk size or node size, and are seeking an explanation or links to relevant articles.
The comments suggest that the community members have done some research and found that they can configure the similarity Top K to retrieve more chunks. However, they are unsure if using a high similarity Top K is always better than using a lower one, or if there are scenarios where a lower similarity Top K might be preferable. The comments also note that retrieving too much data can make it harder for the Language Model (LLM) to synthesize an accurate response, and that it can also increase latency and cost.
lets say if i use claude 3 models, which means it has much higher context length. Does using high similarity Top K always better than using lower? or is there scenario i want to use lower similarity top k