Find answers from the community

Updated 4 months ago

Hi guys if any one can help me this

At a glance
Hi guys if any one can help me this would be great.
query_engine = index.as_query_engine(
similarity_top_k=1,
retriever_mode='embedding',
response_mode='compact',
text_qa_template= QA_PROMPT,
service_context=service_context,
verbose = True
)
here i am setting the response mode to compact but still the query_engine is using the create and refine method. Can anyone help, please.
PS: The context is less than 200 tokens so the context window is not fully used. ( I mentioned this because i read in the documentation that if the chunk can't fit the context then it will use the create and refine prompt method, but that is not the case here.)
L
D
14 comments
compact is just an extension of create+refine. The only difference is that it tries to stuff as much text from the retrieved nodes into each LLM call as possible

Is it still hitting the refine proces (i.e. a second LLM call) ? How do you know? How did you check the context size?
see this the prompt helper i am using to chunk the text as you can see I am just chunking with 100 tokens and 0.1 overlap ratio
Attachment
image.png
also i am trying to pull just 1 chunk for the context .
I found out that the index was making refine prompt when i set the sys.logging to debug here i can show the screenshot of that output
It is using the same context for the refining part
Attachment
image.png
I dont want it to use the refine method. This is causing me a lot of llm calls and token usage
@Logan M Please help
Set the chunk size in the service context directly, not the prompt helper. You can probably ignore the prompt helper actually

I would set the chunk size to 512 minimum though, otherwise embeddings may not work well
Using all default settings shouldn't invoke the refine template either technically πŸ‘€
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
chunk_size=chunk_size_limit,
num_output=num_outputs,
context_window=context_window,
chunk_overlap=chunk_overlap
)
i tried this but the embeddings did not get created
what is the issue of using a prompt helper
doesn't it work like langchain's text splitter
@Logan M please help my friend
Nothing wrong with using a prompt helper, but there's two chunk sizes in llama index, one at query time (prompt helper), and one at data ingestion time (I.e. in the node parser)

Running with default settings should not trigger the refine process (except in some edge cases with non-english languages or data that doesnt use many spaces)

If you want to lower the chunk size though, you can do it in the service context (but only the chunk size can be set this way)

ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size=512)

I'm not sure why your emebddings didn't get created with your attempt though, very weird
Add a reply
Sign up and join the conversation on Discord