LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

Disable refine?

Disable refine?

At a glance

The community members are discussing ways to disable the refine prompt in a GPT-based system. The main suggestions include: - Setting similarity_top_k=1 and reducing the chunk_size_limit to fit the entire response in one LLM call - Trying a custom refine prompt - Reducing the chunk size by splitting the document into smaller units (e.g. sentences or subjects) However, the community members are still struggling to find a reliable solution, as the refine prompt behavior seems to be inconsistent, especially with the GPT-3.5 model. They also discuss potential improvements like adding a "super compact" mode or a max_refined parameter, but the maintainer is unsure if those would be helpful. Overall, the community is actively trying to find ways to optimize the refine prompt behavior.

·

Is there a way to disable the refine prompt?

1

L

G

z

53 comments

Not explicitly.

One option is you can set similarity_top_k=1 (the default) and also set chunk_size_limit small enough so that one node always fits in the prompt (likely ~3500 or less id guess, but you'd have double check that)

The refine prompt is needed in order to handle answer synthesis across multiple nodes. But I know chatGPT struggles with it.

Where is chunk_size_limit defined? at the index when it's created, or in the llm? I don't quite understand it.

It's on index constructor:
GPTSimpleVectorIndex(documents, chunk_size_limit=3500)

Ok, it makes sense, because it's when the embeddings for each chunk are generated.
Thanks @Logan M , helpful as always

Thank you, but the solution you have sent doesn't work. Do you have any other solution?

That's all I got for now 😅🫠

You can try making a custom refine prompt as well I suppose. You use davinci-003

i met with this issue too. it keeps generating refined response on several nodes, even after I set similarly_top_k=1 😢

You'll have to set the chunk size smaller to avoid the refine 😞 hopefully we can figure out the issues with gpt3.5 soon...

i set chunk_size_limit=600 both during creating index and query, still get multiple refined results with similarity_top_k=1, I don’t know how to tune it now.

Whaaaat how is refine being used 😅 did you increase the num_ouput of the model?

With default settings, a chunk size of 600 should mean all the text fits into one llm call 🫠 in fact it could be even higher and still fit...

Similarity_top_k=1 was put in the index.query as one parameter, together with prompt _helper and context parameter, is it ok?

code like this """ def ask_ai():
llama_logger = LlamaLogger()
chunk_size_limit = int(request.form["chunk_size_limit"])
response_mode = request.form["response_mode"]
temperature = float(request.form["temperature"])
similarity_top_k = int(request.form["similarity_top_k"])
prompt_helper = PromptHelper(max_input_size=max_input_size, num_output=num_output, max_chunk_overlap=max_chunk_overlap,chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=temperature, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, llama_logger=llama_logger, chunk_size_limit=chunk_size_limit)
query_string = request.form["query"]
session["query"] = query_string

response = myindex.query(query_string, text_qa_template=QA_PROMPT, refine_template=REFINE_PROMPT, response_mode=response_mode, service_context = service_context, similarity_top_k=similarity_top_k)
output = {
"response": str(response),
"tokens": str(llm_predictor.last_token_usage),
"source_nodes": str(response.source_nodes),
"logs": str(llama_logger.get_logs())
}
llama_logger.reset()
return (json.dumps(output)) """ parameters are as Const or web page form variable input.

What did you set max_input_size and num_output to in the prompt helper? I don't see their values 👀

max_input_size = 4096
num_output = 3000
max_chunk_overlap = 20 they are globle variables.

I upgraded gpt_index to 0.5.13.post1 and langchain to 0.0.138 just now, index creation is very smooth, but during query, get error: File "d:\User\Documents\study\python3.10\chatgpt-on-internal-doc.venv\lib\site-packages\gpt_index\langchain_helpers\text_splitter.py", line 157, in split_text_with_overlaps
raise ValueError(
ValueError: A single term is larger than the allowed chunk size.
Term size: 58
Chunk size: 54Effective chunk size: 54

i think i should switch back to a older version

is your text in english? I know our default chunking isnt the best for languages that don't use many spaces 🤔

not English, it's Chinese

I should make a word split before creating index

will try later

thanks for your help! you always there when llama guys in need 🥹

Yea you can also pass in your own text splitter, like a character splitter instead of a token splitter 🙂

Always happy to help! 💪

can you add a parameter like max_refined ? can be set from 0 to 20, default value is 3 or 5 🥹

I mean, I could but also I'm not sure it makes sense to have an option like that 😅 like, how would a user know what a good amount of refine is?

If I set top k to 5 for example, I expect the LLM to read all 5 nodes and return an answer, but this can also mean 0-10 refines to do, depending on my settings and chunk size 😅

weather good or not to enable or disable refined response is depending on the quality and details of the doc, normally a traditional non-tech hand book like doc will need some refined response, but an accurate enough doc don't need refined respond as some of us have tested. complex thought may helpful but make user lose the control of steps too, with an enable or disable refine option is good for user to manipulate the docs and the query length, it will be good i think. especially saw most of the initial response are better than multi-timed refine in log.

the likely issue appears between AgentGPT and AutoGPT too, people responds that AutoGPT is better than AgentGPT, a very important reason is user can choose wheather or not to let the bot going to next setp in AutoGPT.

That might true! but I still think adding a max refine would be a confusing option for users 😅

I think the bigger problem is just the refine prompt needs some prompt engineering to work better, the current behavior with gpt-3.5 is not expected (and it used to work better too!)

I checked the log, since I set similarity_top_k = 1, so the query found 1 node for me, it indeed is the most related part, that's cool! and that's a large node with over 1600 Chinese characters, and after that, this node was divided into several chunks, got initial responds from the 1st chunk, and then refined with following chunks one by one in the order of from head to tail, to construct the final refined respond. I have a question here: what does the mode "compact" do? what's the differences between "compact" mode and "default" mode? I thought the "compact" mode means the query will choose the only chunk with the highest similarity score and forget all the other chunks in the same node. but it seems it only take effect between nodes, not chunks.

Progress!

"compact" means as much text as possible will be put into each LLM call (usually only matter if top k is higher than one)

is there a possibility to control a "super compact" mode take effect between chunks?

I think the solution here is to use a smaller chunk_size_limit 🤔

I mean, reply with the chunk with highest score in the node. don't consider other chunks in the same node.

or ... I should cut document into sentences or at least into subjects, so gpt-index can produce smaller and more accurate node

So, it already retrieved the chunk with the highest similarity, the chunk is just very big 😅

Yea you could do some pre-processing to your documents too. But I think chunk_size_limit should help a bit here too

so the node is chunk ? ... how to understand the multi-time refine action in the node then?

it's because the limit of llm API ?

if I change the llm call from gpt-3.5-turbo to gpt4, it maybe better? with less refine...

I will try the same document with small chunk_size_limit now 🫡

Yea, so each call to the LLM needs to fit the query + node text + prompt template (+ optional existing answer for refine)

So if the node text is too big, it gets split into the refine process. Hopefully chunk_size_limit helps limit node text size 😅

I set chunk_size_limit = 100 to reindex and query the document, multiple refine gone. it only refine 1 time now. it's a good news! but the node in the log is still the same big one like the one get from chunk_size_limit = 1000 index, I am using this as node_parser: sentence_splitter = SentenceSplitter()
node_parser = SimpleNodeParser(text_splitter=sentence_splitter)
service_context = ServiceContext.from_defaults(node_parser=node_parser, llm_predictor=llm_predictor, prompt_helper=prompt_helper, embed_model=embed_model)

Did you set the chunk size in both the splitter and the prompt helper?

only in prompt helper

i see there is chunk_size settings in SentenceSplitter() , default value is 4000

i'll add this, thank you!

So many parameters to set 😵‍💫 But I hope it works haha

before I set chunk_size = chunk_size_limit (I set it to 200), the index construction process only costs about 5 seconds, after I set the limit, the index construction process lasts over 10 mins and no sign to stop... I checked the process of OS, python keeps about 20% CPU usage, so, it seems not hanging there, I'll wait for it comes to an end...😂

Ohh this makes sense, because it is embedding many chunks now! And by default, it only sends in batches of 10

I just solved this other day, I will find the code

(btw, you might also want to see the chunk_overlap in your splitter, the default is 200... maybe it created an endless loop LOL)

I set chunk_overlap=20 now. after 30 mins if it doesn't stop, i will restart it from chunk_overlap=5 to see if i am lucky

This also might help speed up! The default is batch sizes of 10

Plain Text

from llama_index.embeddings.openai import OpenAIEmbedding
service_context = ServiceContext.from_defaults(..., embed_model=OpenAIEmbedding(embed_batch_size=50))

many thanks!!!

it's really fast now! 🥹 and the query results are good too. for my doc, the combination of "chunk_size_limit=600, mode =compact, similarity_top_k=3, temperature=0" will get the best results! and it's fast enough without any refine! 🥹

:dotsCATJAM: Amazing! :dotsHARDSTYLE:

Add a reply

Sign up and join the conversation on Discord