Is it because the response + input prompt is getting longer than 4096? Do you have max_tokens set on the llm?
Is there a way to check that? Yes I do have the max_tokens set
I tried tweaking the max tokens for input and output and it's still cutting off
How are you setting up the GaurdRails parser?
I also tried removing Guardrails and just using the normal LLMPredictor and the JSON also cuts out, so perhaps it's not a Guardrails issue?
Should I paste in the input data or my prompt?
I think paste whatever the output is
like the json that's cut off
Ah I see, this is what I get for that:
Tokens 697
Characters 1078
ok nice! So it is going beyond the default of 256, which means max_tokens is working
What are your prompt helper setting for that?
Here's what I have for that part:
max_input_size = 4096
num_output = 1024
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
# define LLM
llm_predictor_chatgpt = StructuredLLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_output))
service_context_chatgpt = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt, prompt_helper=prompt_helper, chunk_size_limit=2500)
I have a feeling something is getting too large somewhere and it's running out of room π€
I tried lowering chunk size limit to 2000 then 1400, but still getting cut off.
The debug handler throws this error:
Traceback (most recent call last):
print(llama_debug.get_event_time_info(CBEventType.LLM))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/llama_index/callbacks/llama_debug.py", line 140, in get_event_time_info
return self._get_time_stats_from_event_pairs(event_pairs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/llama_index/callbacks/llama_debug.py", line 118, in _get_time_stats_from_event_pairs
average_secs=total_secs / len(event_pairs),
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
ZeroDivisionError: float division by zero
Oh, try calling get_llm_inputs_outputs()
instead (assuming you attached the callback to your run of course π)
Like this correct? llama_debug.get_llm_inputs_outputs()
This just returned [[CBEvent(event_type=<CBEventType.LLM: 'llm'>, payload=
followed by the index file contents
Hmm. It should be a list of pairs. You'd want the payload the last element in each pair
That will be a dict of response and formatted prompt
Hmm I don't seem to see my prompt anywhere there
I just see the contents of my input file
Odd π« and also annoying lol
I'm running out of tricks to debug this π₯²
Could it be the same issue I had yesterday? The service context not getting passed to the query?
Since his cutoff output is already very long (over 600 tokens), it's definitely generating pass the default, so I'm not sure if that's the issue π€
This is the only thing I see related to the prompt, but I don't see my prompt itself:
\n\n\nSUMMARY:"""\n'}
Unless this is what you meant? ^
That might be part of it π€ depends on the index you are using I guess haha
Actually I see this now: 'formatted_prompt': 'Write a summary of the following. Try to use only the information provided. Try to include as many key details as possible.\n\n\
It's weird because I never wrote that anywhere in my prompt
Yea that's an internal summary prompt
What kind of index are you using?
Ah yea, so that's part of the tree building process
Is it possible it's messing up due to my query being really long?
I'm curious as to what the point of the query is if you're describing each point in the rail <object>?
I think the point is the llm does its best to output in the proper format, then gaurdrails double checks it
Definitely possibly due to the query being very long π€
Yup turns out the query was too long