LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

Can anyone help me make sense of this

Can anyone help me make sense of this

At a glance

·

Can anyone help me make sense of this error;
1 validation error for ContactList\nroot\n Expecting property name enclosed in double quotes: line 44 column 6 (char 984) [type=value_error.jsondecode, input_value='{\n "contacts": [\n ...(415) 701-5869",\n ', input_type=str]

I think it's coming from the response of OpenAI, not proper json?

L

b

70 comments

yea looks like it's incomplete

I wonder if the input ended up being too big somehow 🤔

the input to openai/

or the response from openai

ahh probably... maybe ran out of tokens?

Plain Text

openai_model = "gpt-3.5-turbo-1106"
# set context window
context_window = 14385
# set number of output tokens
max_tokens = None
# define LLM

and for service_context
num_output=2000

Uhhh doesn't gpt-3.5 only have a 4k context window?

Shut up no way

Oh neat, the new one is 16k

I thought 16k

😂

Made me nervous

Hmm but in any case, feels like an odd edge case where we somehow didn't leave enough room for 2000 output tokens 🤔

I tried increasing num output to 3k. Still problematic

This is on a specific document? Happy to try and reproduce and figure out what's up

I'm curious how big (in tokens) the request is to OpenAI

I’ll send you the text chunks from the accumulate synthesizer

lmk if this is helpful

So just to clarify, these are the input chunks to the synthesizer?

(I know that's basically what you just said hahaha just double checking)

these are the chunks passed into the predictor

Plain Text

print(text_chunks)
        return [
            predictor(
                text_qa_template,
                context_str=cur_text_chunk,
                output_cls=self._output_cls,
                **response_kwargs,
            )
            for cur_text_chunk in text_chunks
        ]

right there meow

Hmm OK OK

Really weird since those text chunks are only 4700 tokens

Tested using openais token counter here
https://platform.openai.com/tokenizer

When I'm at my computer, I'll try seeing if I can reproduce

i also pass in a huge prompt as you know

Right, but that prompt isn't that big right?

What happens if you paste your prompt into that token counter app?

i wonder if it's causing issues because I'm pasing in arrays instead of normal text.

shouldn't be but

uhm... i;m having getting that working

i'm also using tiktoken encoder btw

ok got it

it's like ~1100 tokens

so should be plenty of room right

Ya, so like 6K tokens max in the prompt -- much less than 16k 🤔

why is it even two chunks

shouldn't t hat just be 1 api call

i guess because I have the text splitter

here's my base setup fyi

Plain Text

globals_helper._tokenizer = tiktoken.encoding_for_model(openai_model).encode
        self._token_counter = TokenCountingHandler(
            tokenizer=tiktoken.encoding_for_model(openai_model).encode
        )
        self.contact_service = contact_service
        # self._llama_debug = LlamaDebugHandler(print_trace_on_end=True)

        callback_manager = CallbackManager([self._token_counter])
        self._text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
            chunk_size=1024, chunk_overlap=100
        )
        self._prompt = get_prompt()
        self._llm = OpenAI(temperature=0, model=openai_model, max_tokens=max_tokens)
        self._service_context = ServiceContext.from_defaults(
            llm=self._llm,
            callback_manager=callback_manager,
            context_window=context_window,
            num_output=3000,
        )

Except your output cls also uses tokens 👀

things that get passed into functions use tokens?

But no way it's that big

let me print it

702 tokens for all of my output_cls

Yea tiny

Hmm very sus overall. Will see if I can reproduce it at some point today.

Maybe openai is just being a dick and not writing complete json lol

Plain Text

{
  "contacts": [
    {
      "first_name": "Adam",
      "last_name": "Young",
      "email": "",
      "phone": "",
      "title": "Policy & External Affairs",
      "department": "",
      "metadata": ""
    },
    {
      "first_name": "Alan",
      "last_name": "Nguyen",
      "email": "",
      "phone": "(415) 557-4939",
      "title": "HR Modernization Project",
      "department": "",
      "metadata": ""
    },
    {
      "first_name": "Alana",
      "last_name": "Washington",
      "email": "",
      "phone": "(415) 701-5394",
      "title": "ES Division: Operations",
      "department": "",
      "metadata": ""
    },
    {
      "first_name": "Alarice",
      "last_name": "Allen",
      "email": "",
      "phone": "(415) 551-8923",
      "title": "Workers’ Compensation Division",
      "department": "",
      "metadata": ""
    },
    {
      "first_name": "Alejandro",
      "last_name": "Cervantes",
      "email": "",
      "phone": "(415) 701-5869",

yeah just completely cuts off.

        print(function_call["arguments"])

weirdddd

Approx 273 tokens, very close to the typical 256 default 🤔

Maybe try also setting max_tokens to 2000 and don't set num_outputs? Or set both?

stops everytime

at same thing

no matter those settings

Plain Text

["Alejandro Cervantes Workers\\u2019 Compensation Division (415) 701-5869"], ["Alessandro Queri Workers\\u2019 Compensation Division"]

stops right at that phone

weird \u2019

which is a '

but the one above it works.. ok nvm

k removed those charcters just to test, still no bueno

I'm getting it on another piece of text too

wtf did i change to cause this

maybe just to be extra sure your settings are propagating as needed, try also setting a global service context early on?

Plain Text

from llama_index import set_global_service_context

set_global_service_context(service_context)

Feels very weird, like your num_ouputs and context_window arguments aren't being respected?

i've had some weird caching happen w/ virtualenv before

set global context, let's see

i will delete virtualenv and reinstall if this doesn't work

ok

that worked

weird!!

spooky!

something must have been defaulting back to a default service context somehow

so setting the global fixes that 🤔

Add a reply

Sign up and join the conversation on Discord