I'm using a following parameters to feed llm with some external factual context and based on that to generate new content.
This context is very long - has something like between 50 00 - 70 00 words. Could anyone explain to me wheather or not I have some advantage when using 16k gpt 3 model over standard gpt 3.5 turbo?
E.g. when generating responses - will they have better quality when using model with larger context window or it does not matter when chunking it using llama? Here are my current specs
As long as the context fits inside each respective window, it should not matter. I haven't seen any evidence or benchmarks highlighting any noticeable differences between the models