Hey Everyone, I have a question I hoppe

It should pick the GPT-4 as mentioned by you.
is your openAI client updated?

You can try printing the llm print(Settings.llm)

This is part of the message I get back when I do this (hiding API key)

Plain Text

initializing settings...
callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x1026cf9d0> system_prompt=None messages_to_prompt=<function messages_to_prompt at 0x105ae47c0> completion_to_prompt=<function default_completion_to_prompt at 0x105b5e660> output_parser=None pydantic_program_mode=<PydanticProgramMode.DEFAULT: 'default'> query_wrapper_prompt=None model='gpt-4-0125-preview' temperature=0.1 max_tokens=None

Should I perhaps change max_tokens??

it is passed as None by itself?
Yeah try changing it to something let say 512 and try again as it is picking the correct model

Yeah, by itself. I am trying with this now:

Plain Text

    def run(self):
        Settings.llm = OpenAI(
            model="gpt-4-0125-preview", temperature=0.1, max_tokens=512
        )
        Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")
        print(Settings.llm)
        self._create_nodes()
        self._create_index()

This is output from print:

Plain Text

callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x1043dbb50> system_prompt=None messages_to_prompt=<function messages_to_prompt at 0x107698720> completion_to_prompt=<function default_completion_to_prompt at 0x11900a5c0> output_parser=None pydantic_program_mode=<PydanticProgramMode.DEFAULT: 'default'> query_wrapper_prompt=None model='gpt-4-0125-preview' temperature=0.1 max_tokens=512 additional_kwargs={} max_retries=3 timeout=60.0 default_headers=None reuse_client=True api_key=[HIDDEN] api_base='https://api.openai.com/v1' api_version=''

And I still get this error:

Plain Text

raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 11486 tokens (11486 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

So it seems like the 512 max tokens is not going through? Since it still says "maximum context length is 8192 tokens"

Your openai client is updated right?

8192 is for gpt-4

Attachment

What do you mean updated? As you can see in the prinout it says ...model='gpt-4-0125-preview but is there something else I should do?

It is as if the settings are not registering

No i mean your openai library is updated or not

Yes, using openai 1.13.3

this error is coming from the embedding model most likely

If you had the full traceback, it would probably be clearer

That was my thought too, but then again it is saying that "This model's maximum context length is 8192 tokens" which is leading me to believe it has to do with it falling back to using default gpt-4 which has that exact amount of maximum tokens

@Logan M Here is the full traceback:

https://github.com/run-llama/llama_index/discussions/11889#discussioncomment-8777496

If you read the full trace, it's the embedding model

embed_model.get_text_embedding_batch(...

Is a line in the traceback

This means one of your nodes is quite long.

How did you create the nodes/documents/index?

The model could he anything. Which in this case, it's the embedding model

@Logan M Right.. I created them "manually" I using verbatim meeting notes where a bunch of different people are talking. Each node is what was said, with the metadata being name of speaker...

Cool cool. So in that case, it's very likely the text is just too long in the node

Probably it should be chunked

Or summarized

I'll try this instead:

Plain Text

Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large")

(I forget if they increased the context window for those new models, but go for it!)

Interesting. Maybe I could summarize them beforehand if they surpass the token limit

Yeah, I gues it isn't changed with the larger embedding model. Thanks for the help @Logan M and @WhiteFang_Jr !!

@Logan M What would be the best way to split the documents if I'm using MarkdownNodeParser? It seems that I'm hitting the token limit for a markdown doc that I'm guessing has a ton of markdown under one header, causing it to hit the token limit

Chain it with another parser/splitter

nodes = MarkdownNodeParser()(documents)
nodes = SentenceSplitter()(nodes)

@Logan M I ended up excluding those docs.

I’m running into an issue where the token size is being exceeded because some of my docs are really large. I want each chunk to maintain the markdown formatting & sections. Is there any way for me to limit the tokens that are passed into the context from the docs

Instead of chunking the docs

Oh I think it’s because I set the memory_token_limit

@Logan M would it be possible to use the markdown element node parser and markdown node parser together?

I’m doing this within an ingestion pipeline if I only use markdown node parser would I first use the parser and then the sentence splitter