Find answers from the community

Updated 8 months ago

Is there a way to directly create a

Is there a way to directly create a query engine object/instance from OpenAI without having an index? I want to make sure my APIs for different use cases are the same.
W
N
L
16 comments
You could try with this:
index = VectorStoreIndex.from_documents([])
Thanks. Is there a way to make it so llama index always uses the FULL context for a document when doing a query? Sometimes our documents are so short that we want to put all of it in the prompt to OpenAI (we now do HTTP openai requests for those).

Would be more ideal if we can just simply use the familiar Llama index API
llamaIndex will put the max amount of documents/nodes while querying with LLM. So it is fine even if the documents are small in size.
But if i set similarity_top_k to 2 it will only pick the 2 most similar chunks if i am correct?
If we set this to None will that take as much as it can to fit the context window?
I would suggest to increase the number and use Node postprocessor: SimilarityPostprocessor
This will only let the documents which has breached the threshold for LLM query stage.
The thing about our query is that it is a very general summarization query. We ask to create powerpoint slides based on the content of the document. Dont you think a similarity postprocessor wont help here because the query does not ask about any specifics of the nodes?
This is something we have been struggling with for a while and for now our only solution has been to add a keyword extractor and manually fetch those and add them to the prompt
If you want a summarization query, you need to route between something that is meant to summarize (a summary index, vector index with top-k set to 10000) and something that doesn't
Ahh thanks for the advice. We had a problem before with a top-k of 20 where it exceeded the max context window for our model. How can we make sure this does not happen?
Was this in a query engine? Or a chat engine? (Id be very surprised if it was a query engine tbh)
Might have been the chat engine, is there a difference there?
Yea, they have different handling.

Like a context chat engine for example will just retrieve the top-k, apply node-postprocessors (if any given), stuff that into the system prompt, and not do much else beyond that.

a query engine splits text and can end up making multiple LLM calls
Ahhh okay cool, so for our use case (creating presentations from documents) it could be a good idea to use a very high top k and make llama index do multiple calls under the hood?
Yea exactly -- like I would imagine that being handled by a query engine
Add a reply
Sign up and join the conversation on Discord