yea looks like it's incomplete
I wonder if the input ended up being too big somehow π€
or the response from openai
ahh probably... maybe ran out of tokens?
openai_model = "gpt-3.5-turbo-1106"
# set context window
context_window = 14385
# set number of output tokens
max_tokens = None
# define LLM
and for service_context
num_output=2000
Uhhh doesn't gpt-3.5 only have a 4k context window?
Oh neat, the new one is 16k
Hmm but in any case, feels like an odd edge case where we somehow didn't leave enough room for 2000 output tokens π€
I tried increasing num output to 3k. Still problematic
This is on a specific document? Happy to try and reproduce and figure out what's up
I'm curious how big (in tokens) the request is to OpenAI
Iβll send you the text chunks from the accumulate synthesizer
So just to clarify, these are the input chunks to the synthesizer?
(I know that's basically what you just said hahaha just double checking)
these are the chunks passed into the predictor
print(text_chunks)
return [
predictor(
text_qa_template,
context_str=cur_text_chunk,
output_cls=self._output_cls,
**response_kwargs,
)
for cur_text_chunk in text_chunks
]
When I'm at my computer, I'll try seeing if I can reproduce
i also pass in a huge prompt as you know
Right, but that prompt isn't that big right?
What happens if you paste your prompt into that token counter app?
i wonder if it's causing issues because I'm pasing in arrays instead of normal text.
uhm... i;m having getting that working
i'm also using tiktoken encoder btw
so should be plenty of room right
Ya, so like 6K tokens max in the prompt -- much less than 16k π€
why is it even two chunks
shouldn't t hat just be 1 api call
i guess because I have the text splitter
here's my base setup fyi
globals_helper._tokenizer = tiktoken.encoding_for_model(openai_model).encode
self._token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model(openai_model).encode
)
self.contact_service = contact_service
# self._llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([self._token_counter])
self._text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
chunk_size=1024, chunk_overlap=100
)
self._prompt = get_prompt()
self._llm = OpenAI(temperature=0, model=openai_model, max_tokens=max_tokens)
self._service_context = ServiceContext.from_defaults(
llm=self._llm,
callback_manager=callback_manager,
context_window=context_window,
num_output=3000,
)
Except your output cls also uses tokens π
things that get passed into functions use tokens?
702 tokens for all of my output_cls
Hmm very sus overall. Will see if I can reproduce it at some point today.
Maybe openai is just being a dick and not writing complete json lol
{
"contacts": [
{
"first_name": "Adam",
"last_name": "Young",
"email": "",
"phone": "",
"title": "Policy & External Affairs",
"department": "",
"metadata": ""
},
{
"first_name": "Alan",
"last_name": "Nguyen",
"email": "",
"phone": "(415) 557-4939",
"title": "HR Modernization Project",
"department": "",
"metadata": ""
},
{
"first_name": "Alana",
"last_name": "Washington",
"email": "",
"phone": "(415) 701-5394",
"title": "ES Division: Operations",
"department": "",
"metadata": ""
},
{
"first_name": "Alarice",
"last_name": "Allen",
"email": "",
"phone": "(415) 551-8923",
"title": "Workersβ Compensation Division",
"department": "",
"metadata": ""
},
{
"first_name": "Alejandro",
"last_name": "Cervantes",
"email": "",
"phone": "(415) 701-5869",
yeah just completely cuts off.
print(function_call["arguments"])
Approx 273 tokens, very close to the typical 256 default π€
Maybe try also setting max_tokens to 2000 and don't set num_outputs? Or set both?
["Alejandro Cervantes Workers\\u2019 Compensation Division (415) 701-5869"], ["Alessandro Queri Workers\\u2019 Compensation Division"]
stops right at that phone
but the one above it works.. ok nvm
k removed those charcters just to test, still no bueno
I'm getting it on another piece of text too
wtf did i change to cause this
maybe just to be extra sure your settings are propagating as needed, try also setting a global service context early on?
from llama_index import set_global_service_context
set_global_service_context(service_context)
Feels very weird, like your num_ouputs and context_window arguments aren't being respected?
i've had some weird caching happen w/ virtualenv before
set global context, let's see
i will delete virtualenv and reinstall if this doesn't work
something must have been defaulting back to a default service context somehow
so setting the global fixes that π€