Why Refining ?

At a glance

The community member has set the input size to 3500 and the chunk size to 750. When querying the index with top_k=4, the system always performs refining. The community members discuss possible reasons for this behavior, including the use of top_k>1 and the chunk size settings. One community member suggests removing the chunk_size_limit from the prompt helper to avoid the refining behavior, which the original poster confirms resolves the issue.

zzainab

I have a case in which I have set the input size to 3500, and the chunk size is 750. When I query my index with top_k =4, it always does refining. Any idea why this happens??
Note: qa prompt size is 93 and the query around 50 tokens

12 comments

rravitheja

If you are using top_k > 1. It uses create and refine synthesis approach to generate answer/ response. So refining is expected right?

zzainab

no, as the total prompt size didn't exceed 3500

LLogan M

Where did you set the chunk size?

It might help if you are able to package a minimum example 🙂

zzainab

I set the chunk size in the prompt helper, service context, and node parser. All these have the same chunk size value

LLogan M

Do you have an example you can share to reproduce this? I would like to step through the code with a debugger to investigate 🙂 Just need some sample docs+code, if possible?

zzainab

I,cannot share the documents as it's for our clients. but I can share the way I set the index with you Note that the llm model and embedding model are custom

prompt_helper = PromptHelper(max_input_size=3500,
                             chunk_size_limit=750,
                             num_output=256,
                             max_chunk_overlap=75)
node_parser = SimpleNodeParser(
    text_splitter=TokenTextSplitter(
        chunk_size=750,
        chunk_overlap=75
    )
)

service_context = ServiceContext.from_defaults(llm_predictor=self.llm_predictor,
                                               prompt_helper=prompt_helper,
                                               embed_model=self.embedding_model,
                                               chunk_size_limit=75,
                                               node_parser=node_parser
                                              )
nodes = node_parser.get_nodes_from_documents(documents)
storage_context = StorageContext.from_defaults(
                vector_store=ChromaVectorStore(chroma_collection=chroma_collection)
            )
storage_context.docstore.add_documents(nodes)
self.index = GPTVectorStoreIndex(nodes=nodes,
                                 storage_context=storage_context,
                                 service_context=self.service_context
                                 )

LLogan M

I will run with those settings and see if I can reproduce.

I will step over the code line by line to confirm what's happening lol

zzainab

okay looking forward to your response

LLogan M

@zainab to avoid refine, here, try removing chunk_size_limit from the prompt helper 🙂

LLogan M

This is limiting how big each chunk can be when calling the LLM. By setting it the same as the chunk size in the node parser, you will make at least one call per node

zzainab

Many thanks it's now working

LLogan M

Perfect!

Add a reply

Find answers from the community

Why Refining ?