Find answers from the community

Updated 3 months ago

Why Refining ?

I have a case in which I have set the input size to 3500, and the chunk size is 750. When I query my index with top_k =4, it always does refining. Any idea why this happens??
Note: qa prompt size is 93 and the query around 50 tokens
r
z
L
12 comments
If you are using top_k > 1. It uses create and refine synthesis approach to generate answer/ response. So refining is expected right?
no, as the total prompt size didn't exceed 3500
Where did you set the chunk size?

It might help if you are able to package a minimum example πŸ™‚
I set the chunk size in the prompt helper, service context, and node parser. All these have the same chunk size value
Do you have an example you can share to reproduce this? I would like to step through the code with a debugger to investigate πŸ™‚ Just need some sample docs+code, if possible?
I,cannot share the documents as it's for our clients. but I can share the way I set the index with you Note that the llm model and embedding model are custom

prompt_helper = PromptHelper(max_input_size=3500, chunk_size_limit=750, num_output=256, max_chunk_overlap=75) node_parser = SimpleNodeParser( text_splitter=TokenTextSplitter( chunk_size=750, chunk_overlap=75 ) ) service_context = ServiceContext.from_defaults(llm_predictor=self.llm_predictor, prompt_helper=prompt_helper, embed_model=self.embedding_model, chunk_size_limit=75, node_parser=node_parser ) nodes = node_parser.get_nodes_from_documents(documents) storage_context = StorageContext.from_defaults( vector_store=ChromaVectorStore(chroma_collection=chroma_collection) ) storage_context.docstore.add_documents(nodes) self.index = GPTVectorStoreIndex(nodes=nodes, storage_context=storage_context, service_context=self.service_context )
I will run with those settings and see if I can reproduce.

I will step over the code line by line to confirm what's happening lol
okay looking forward to your response
@zainab to avoid refine, here, try removing chunk_size_limit from the prompt helper πŸ™‚
This is limiting how big each chunk can be when calling the LLM. By setting it the same as the chunk size in the node parser, you will make at least one call per node
Many thanks it's now working
Add a reply
Sign up and join the conversation on Discord