Hi is there anyway to speed up the response generation of a simple query engine?
My query engine looks like this. My QA_PROMPT is quite long. Not sure if that's slowing down the response generation. But what are the other factors that can speed things up?
You could use the retriever and measure the time (index.as_retriever()) to check if the problem is in the llm generation vs retriever. My initial hypothesis is that the generation is taking much longer than the retrieval If you let your model generate more than 256 tokens it can take a while as well - I would limit the token context to 256. You can also add to your prompt: do not answer with more than XYZ words