I m using Guardrails but the response

ppaulo

I'm using Guardrails but the response keeps getting cut off (the JSON object doesn't fully close). Does anyone know how to solve this?

43 comments

LLogan M

Is it because the response + input prompt is getting longer than 4096? Do you have max_tokens set on the llm?

ppaulo

Is there a way to check that? Yes I do have the max_tokens set

ppaulo

I tried tweaking the max tokens for input and output and it's still cutting off

LLogan M

How are you setting up the GaurdRails parser?

ppaulo

I followed this exactly: https://gpt-index.readthedocs.io/en/latest/examples/output_parsing/GuardrailsDemo.html

Besides the StructuredLLMPredictor where I customized some params:

llm_predictor_chatgpt = StructuredLLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_output))

ppaulo

I also tried removing Guardrails and just using the normal LLMPredictor and the JSON also cuts out, so perhaps it's not a Guardrails issue?

LLogan M

Do you have a cut off response handy? How long is it if you test in here? https://platform.openai.com/tokenizer

ppaulo

Should I paste in the input data or my prompt?

LLogan M

I think paste whatever the output is

LLogan M

like the json that's cut off

ppaulo

Ah I see, this is what I get for that:

Tokens 697
Characters 1078

LLogan M

ok nice! So it is going beyond the default of 256, which means max_tokens is working

LLogan M

What are your prompt helper setting for that?

ppaulo

Here's what I have for that part:

Plain Text

max_input_size = 4096
num_output = 1024
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)


# define LLM    
llm_predictor_chatgpt = StructuredLLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_output))
service_context_chatgpt = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt, prompt_helper=prompt_helper, chunk_size_limit=2500)

LLogan M

Hmm, maybe try lowering chunk size limit a bit? That's my best guess haha

You can also try using the llama debug callback. If you fetch the LLM events, you should be able to see the full input+response to the model on the event end pairs... except gaurdrails calls it's own external thing, so that won't be tracked exactly...
https://gpt-index.readthedocs.io/en/latest/examples/callbacks/LlamaDebugHandler.html

LLogan M

I have a feeling something is getting too large somewhere and it's running out of room 🤔

ppaulo

I tried lowering chunk size limit to 2000 then 1400, but still getting cut off.
The debug handler throws this error:

Plain Text

Traceback (most recent call last):
    print(llama_debug.get_event_time_info(CBEventType.LLM))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/llama_index/callbacks/llama_debug.py", line 140, in get_event_time_info
    return self._get_time_stats_from_event_pairs(event_pairs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/llama_index/callbacks/llama_debug.py", line 118, in _get_time_stats_from_event_pairs
    average_secs=total_secs / len(event_pairs),
                 ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
ZeroDivisionError: float division by zero

LLogan M

Lol laaame

LLogan M

Oh, try calling get_llm_inputs_outputs() instead (assuming you attached the callback to your run of course 🙏)

ppaulo

Like this correct? llama_debug.get_llm_inputs_outputs()

ppaulo

This just returned [[CBEvent(event_type=<CBEventType.LLM: 'llm'>, payload= followed by the index file contents

LLogan M

Hmm. It should be a list of pairs. You'd want the payload the last element in each pair

LLogan M

That will be a dict of response and formatted prompt

ppaulo

Hmm I don't seem to see my prompt anywhere there

ppaulo

I just see the contents of my input file

LLogan M

Odd 🫠 and also annoying lol

LLogan M

I'm running out of tricks to debug this 🥲

bbadcom

Could it be the same issue I had yesterday? The service context not getting passed to the query?

LLogan M

Since his cutoff output is already very long (over 600 tokens), it's definitely generating pass the default, so I'm not sure if that's the issue 🤔

ppaulo

This is the only thing I see related to the prompt, but I don't see my prompt itself:

\n\n\nSUMMARY:"""\n'}

ppaulo

Unless this is what you meant? ^

LLogan M

That might be part of it 🤔 depends on the index you are using I guess haha

ppaulo

Actually I see this now:

'formatted_prompt': 'Write a summary of the following. Try to use only the information provided. Try to include as many key details as possible.\n\n\

It's weird because I never wrote that anywhere in my prompt

LLogan M

Yea that's an internal summary prompt

LLogan M

What kind of index are you using?

ppaulo

I'm using GPTTreeIndex

LLogan M

Ah yea, so that's part of the tree building process

ppaulo

Is it possible it's messing up due to my query being really long?
I'm curious as to what the point of the query is if you're describing each point in the rail <object>?

LLogan M

I think the point is the llm does its best to output in the proper format, then gaurdrails double checks it

LLogan M

Definitely possibly due to the query being very long 🤔

ppaulo

Yup turns out the query was too long

LLogan M

:consequences:

LLogan M

Lol good find!

Add a reply

Find answers from the community

I m using Guardrails but the response