Find answers from the community

Updated 3 months ago

Hey Everyone, I have a question I hoppe

Hey Everyone, I have a question I hoppe someone can help me with!

I am creating my index with the following line:
Plain Text
Settings.llm = OpenAI(model="gpt-4-0125-preview", temperature=0.1)
Settings.embed_model = OpenAIEmbedding()
index = VectorStoreIndex(nodes=nodes)


So why am I getting the following error:
Plain Text
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}

When it is specified in the OpenAI documentation that the gpt-4-025-preview has a context length of 128.000 tokens and not 8192 (which vanilla gpt-4 has)
1
W
n
L
40 comments
It should pick the GPT-4 as mentioned by you.
is your openAI client updated?
You can try printing the llm print(Settings.llm)
This is part of the message I get back when I do this (hiding API key)
Plain Text
initializing settings...
callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x1026cf9d0> system_prompt=None messages_to_prompt=<function messages_to_prompt at 0x105ae47c0> completion_to_prompt=<function default_completion_to_prompt at 0x105b5e660> output_parser=None pydantic_program_mode=<PydanticProgramMode.DEFAULT: 'default'> query_wrapper_prompt=None model='gpt-4-0125-preview' temperature=0.1 max_tokens=None
Should I perhaps change max_tokens??
it is passed as None by itself?
Yeah try changing it to something let say 512 and try again as it is picking the correct model
Yeah, by itself. I am trying with this now:
Plain Text
    def run(self):
        Settings.llm = OpenAI(
            model="gpt-4-0125-preview", temperature=0.1, max_tokens=512
        )
        Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
        print(Settings.llm)
        self._create_nodes()
        self._create_index()
This is output from print:
Plain Text
callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x1043dbb50> system_prompt=None messages_to_prompt=<function messages_to_prompt at 0x107698720> completion_to_prompt=<function default_completion_to_prompt at 0x11900a5c0> output_parser=None pydantic_program_mode=<PydanticProgramMode.DEFAULT: 'default'> query_wrapper_prompt=None model='gpt-4-0125-preview' temperature=0.1 max_tokens=512 additional_kwargs={} max_retries=3 timeout=60.0 default_headers=None reuse_client=True api_key=[HIDDEN] api_base='https://api.openai.com/v1' api_version=''
And I still get this error:

Plain Text
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}


So it seems like the 512 max tokens is not going through? Since it still says "maximum context length is 8192 tokens"
Your openai client is updated right?
8192 is for gpt-4
What do you mean updated? As you can see in the prinout it says ...model='gpt-4-0125-preview but is there something else I should do?
It is as if the settings are not registering
No i mean your openai library is updated or not
Yes, using openai 1.13.3
this error is coming from the embedding model most likely
If you had the full traceback, it would probably be clearer
That was my thought too, but then again it is saying that "This model's maximum context length is 8192 tokens" which is leading me to believe it has to do with it falling back to using default gpt-4 which has that exact amount of maximum tokens
If you read the full trace, it's the embedding model
embed_model.get_text_embedding_batch(...
Is a line in the traceback
This means one of your nodes is quite long.

How did you create the nodes/documents/index?
The model could he anything. Which in this case, it's the embedding model
@Logan M Right.. I created them "manually" I using verbatim meeting notes where a bunch of different people are talking. Each node is what was said, with the metadata being name of speaker...
Cool cool. So in that case, it's very likely the text is just too long in the node
Probably it should be chunked
I'll try this instead:

Plain Text
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
(I forget if they increased the context window for those new models, but go for it!)
Interesting. Maybe I could summarize them beforehand if they surpass the token limit
Yeah, I gues it isn't changed with the larger embedding model. Thanks for the help @Logan M and @WhiteFang_Jr !!
@Logan M What would be the best way to split the documents if I'm using MarkdownNodeParser? It seems that I'm hitting the token limit for a markdown doc that I'm guessing has a ton of markdown under one header, causing it to hit the token limit
Chain it with another parser/splitter

nodes = MarkdownNodeParser()(documents)
nodes = SentenceSplitter()(nodes)
@Logan M I ended up excluding those docs.

I’m running into an issue where the token size is being exceeded because some of my docs are really large. I want each chunk to maintain the markdown formatting & sections. Is there any way for me to limit the tokens that are passed into the context from the docs
Instead of chunking the docs
Oh I think it’s because I set the memory_token_limit
@Logan M would it be possible to use the markdown element node parser and markdown node parser together?
I’m doing this within an ingestion pipeline if I only use markdown node parser would I first use the parser and then the sentence splitter
I don't think those two are compatible tbh 🤔
Add a reply
Sign up and join the conversation on Discord