Hi guys if any one can help me this

At a glance

Hi guys if any one can help me this would be great.
query_engine = index.as_query_engine(
similarity_top_k=1,
retriever_mode='embedding',
response_mode='compact',
text_qa_template= QA_PROMPT,
service_context=service_context,
verbose = True
)
here i am setting the response mode to compact but still the query_engine is using the create and refine method. Can anyone help, please.
PS: The context is less than 200 tokens so the context window is not fully used. ( I mentioned this because i read in the documentation that if the chunk can't fit the context then it will use the create and refine prompt method, but that is not the case here.)

14 comments

LLogan M

compact is just an extension of create+refine. The only difference is that it tries to stuff as much text from the retrieved nodes into each LLM call as possible

Is it still hitting the refine proces (i.e. a second LLM call) ? How do you know? How did you check the context size?

DDeadShot609

see this the prompt helper i am using to chunk the text as you can see I am just chunking with 100 tokens and 0.1 overlap ratio

Attachment

DDeadShot609

also i am trying to pull just 1 chunk for the context .

DDeadShot609

I found out that the index was making refine prompt when i set the sys.logging to debug here i can show the screenshot of that output

DDeadShot609

It is using the same context for the refining part

Attachment

DDeadShot609

I dont want it to use the refine method. This is causing me a lot of llm calls and token usage

DDeadShot609

@Logan M Please help

LLogan M

Set the chunk size in the service context directly, not the prompt helper. You can probably ignore the prompt helper actually

I would set the chunk size to 512 minimum though, otherwise embeddings may not work well

LLogan M

Using all default settings shouldn't invoke the refine template either technically 👀

DDeadShot609

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
chunk_size=chunk_size_limit,
num_output=num_outputs,
context_window=context_window,
chunk_overlap=chunk_overlap
)
i tried this but the embeddings did not get created

DDeadShot609

what is the issue of using a prompt helper

DDeadShot609

doesn't it work like langchain's text splitter

DDeadShot609

@Logan M please help my friend

LLogan M

Nothing wrong with using a prompt helper, but there's two chunk sizes in llama index, one at query time (prompt helper), and one at data ingestion time (I.e. in the node parser)

Running with default settings should not trigger the refine process (except in some edge cases with non-english languages or data that doesnt use many spaces)

If you want to lower the chunk size though, you can do it in the service context (but only the chunk size can be set this way)

ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size=512)

I'm not sure why your emebddings didn't get created with your attempt though, very weird

Add a reply

Find answers from the community

Hi guys if any one can help me this