Is there a way to directly create a query engine object/instance from OpenAI without having an index? I want to make sure my APIs for different use cases are the same.
Thanks. Is there a way to make it so llama index always uses the FULL context for a document when doing a query? Sometimes our documents are so short that we want to put all of it in the prompt to OpenAI (we now do HTTP openai requests for those).
Would be more ideal if we can just simply use the familiar Llama index API
I would suggest to increase the number and use Node postprocessor: SimilarityPostprocessor This will only let the documents which has breached the threshold for LLM query stage.
The thing about our query is that it is a very general summarization query. We ask to create powerpoint slides based on the content of the document. Dont you think a similarity postprocessor wont help here because the query does not ask about any specifics of the nodes?
This is something we have been struggling with for a while and for now our only solution has been to add a keyword extractor and manually fetch those and add them to the prompt
If you want a summarization query, you need to route between something that is meant to summarize (a summary index, vector index with top-k set to 10000) and something that doesn't
Ahh thanks for the advice. We had a problem before with a top-k of 20 where it exceeded the max context window for our model. How can we make sure this does not happen?
Like a context chat engine for example will just retrieve the top-k, apply node-postprocessors (if any given), stuff that into the system prompt, and not do much else beyond that.
a query engine splits text and can end up making multiple LLM calls
Ahhh okay cool, so for our use case (creating presentations from documents) it could be a good idea to use a very high top k and make llama index do multiple calls under the hood?