Find answers from the community

Updated last year

Any ideas on how to protect JSON

Any ideas on how to protect JSON failures from using output_classes:
Plain Text
  File "/Users/bmax/src/pursuit/ai/lib/python3.8/site-packages/typing_extensions.py", line 2562, in wrapper
    return __arg(*args, **kwargs)
  File "/Users/bmax/src/pursuit/ai/lib/python3.8/site-packages/pydantic/main.py", line 1026, in parse_raw
    raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
pydantic_core._pydantic_core.ValidationError: 1 validation error for ContactList
__root__
  Unterminated string starting at: line 232 column 15 (char 4738) [type=value_error.jsondecode, input_value='{\n  "contacts": [\n    ... Clogged Weekdays 6 a.m', input_type=str]
b
L
29 comments
Plain Text
            output = self.output_cls.parse_raw(function_call["arguments"])

failing
Looks like to me the LLM ran out of room writing the JSON response πŸ‘€ '{\n "contacts": [\n ... Clogged Weekdays 6 a.m' is missing the closing brace
oh that makes sense
there is a shit done of data in this one...
okay, let me rephrase question πŸ˜…
is there a way to have like a dynamic max_tokens?
I think it's already dynamic by default πŸ˜… it might be actually just running out of room
Plain Text
openai_model = 'gpt-3.5-turbo-16k'
# set context window
context_window = 14300
# set number of output tokens
max_tokens = 1500
but like... sometimes i don't need 1500 for max tokens
set max_tokens=None
sometimes I need way less
sometimes I need more
let me try this None magic
looks like it doesn't something deep in OpenAI code
You'll need to set num_outputs in the service context then too
like, llama-index has to leave room for tokens to generate
I do set num_outputs
should I do None?
cool. Might want to like... double that
but then that means my context window for the nodes will be smaller?
less context per call?
yes, 13,500k instead of 14,500k
or something like that
like there's two sides to this coin
letting the model generate as much as it can (max_tokens) and configuring llama-index to leave room for that (num_outputs)
got it that's where max_tokens = None
that worked
max_tokens=None and num_outputs=1500
Add a reply
Sign up and join the conversation on Discord