Find answers from the community

Updated 2 years ago

I m using Guardrails but the response

I'm using Guardrails but the response keeps getting cut off (the JSON object doesn't fully close). Does anyone know how to solve this?
L
p
b
43 comments
Is it because the response + input prompt is getting longer than 4096? Do you have max_tokens set on the llm?
Is there a way to check that? Yes I do have the max_tokens set
I tried tweaking the max tokens for input and output and it's still cutting off
How are you setting up the GaurdRails parser?
I followed this exactly: https://gpt-index.readthedocs.io/en/latest/examples/output_parsing/GuardrailsDemo.html

Besides the StructuredLLMPredictor where I customized some params:
llm_predictor_chatgpt = StructuredLLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_output))
I also tried removing Guardrails and just using the normal LLMPredictor and the JSON also cuts out, so perhaps it's not a Guardrails issue?
Do you have a cut off response handy? How long is it if you test in here? https://platform.openai.com/tokenizer
Should I paste in the input data or my prompt?
I think paste whatever the output is
like the json that's cut off
Ah I see, this is what I get for that:

Tokens 697
Characters 1078
ok nice! So it is going beyond the default of 256, which means max_tokens is working
What are your prompt helper setting for that?
Here's what I have for that part:


Plain Text
max_input_size = 4096
num_output = 1024
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)


# define LLM    
llm_predictor_chatgpt = StructuredLLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_output))
service_context_chatgpt = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt, prompt_helper=prompt_helper, chunk_size_limit=2500)
Hmm, maybe try lowering chunk size limit a bit? That's my best guess haha

You can also try using the llama debug callback. If you fetch the LLM events, you should be able to see the full input+response to the model on the event end pairs... except gaurdrails calls it's own external thing, so that won't be tracked exactly...
https://gpt-index.readthedocs.io/en/latest/examples/callbacks/LlamaDebugHandler.html
I have a feeling something is getting too large somewhere and it's running out of room πŸ€”
I tried lowering chunk size limit to 2000 then 1400, but still getting cut off.
The debug handler throws this error:

Plain Text
Traceback (most recent call last):
    print(llama_debug.get_event_time_info(CBEventType.LLM))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/llama_index/callbacks/llama_debug.py", line 140, in get_event_time_info
    return self._get_time_stats_from_event_pairs(event_pairs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/llama_index/callbacks/llama_debug.py", line 118, in _get_time_stats_from_event_pairs
    average_secs=total_secs / len(event_pairs),
                 ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
ZeroDivisionError: float division by zero
Oh, try calling get_llm_inputs_outputs() instead (assuming you attached the callback to your run of course πŸ™)
Like this correct? llama_debug.get_llm_inputs_outputs()
This just returned [[CBEvent(event_type=<CBEventType.LLM: 'llm'>, payload= followed by the index file contents
Hmm. It should be a list of pairs. You'd want the payload the last element in each pair
That will be a dict of response and formatted prompt
Hmm I don't seem to see my prompt anywhere there
I just see the contents of my input file
Odd 🫠 and also annoying lol
I'm running out of tricks to debug this πŸ₯²
Could it be the same issue I had yesterday? The service context not getting passed to the query?
Since his cutoff output is already very long (over 600 tokens), it's definitely generating pass the default, so I'm not sure if that's the issue πŸ€”
This is the only thing I see related to the prompt, but I don't see my prompt itself:

\n\n\nSUMMARY:"""\n'}
Unless this is what you meant? ^
That might be part of it πŸ€” depends on the index you are using I guess haha
Actually I see this now: 'formatted_prompt': 'Write a summary of the following. Try to use only the information provided. Try to include as many key details as possible.\n\n\

It's weird because I never wrote that anywhere in my prompt
Yea that's an internal summary prompt
What kind of index are you using?
I'm using GPTTreeIndex
Ah yea, so that's part of the tree building process
Is it possible it's messing up due to my query being really long?
I'm curious as to what the point of the query is if you're describing each point in the rail <object>?
I think the point is the llm does its best to output in the proper format, then gaurdrails double checks it
Definitely possibly due to the query being very long πŸ€”
Yup turns out the query was too long
:consequences:
Lol good find!
Add a reply
Sign up and join the conversation on Discord