Nvm nvm it’s not working 😕

At a glance

The post indicates that something is not working, and the comments discuss various issues related to using LangChain and LlamaIndex. Community members suggest adjusting the context window size, chunk size, and other parameters in the service context to address the problem. They also discuss how LlamaIndex handles tokenization and chunking, particularly when using text-generation-inference. The comments suggest that without access to the correct tokenizer, it's difficult to determine the appropriate chunking, and that setting a lower context window is usually advised as an approximation.

bbig_ol_tender

20 comments

LLogan M

it should work in the service context?

What's the issue here?

LLogan M

Depending on the context window size, I would also try reducing the chunk size

Plain Text

llm = LangChainLLM(lc_llm)
service_context = ServiceContext.from_defaults(llm=llm, context_window=2048, chunk_size=512)

set_global_service_context(service_context)

bbig_ol_tender

Maybe it’s me, but I’m using a custom Prompt and the beginning (system message) is getting cutoff. Using compact and refine synthesizer

bbig_ol_tender

If I have 4096 (llama-2) and I have max output tokens of 1000, should I use 4096 or 3096?

bbig_ol_tender

Actually hold up, how is llama index taking context window into account? I’m using text-generation-inference to host my llm, so where is it getting the number of tokens to appropriately chunk?

LLogan M

Still use 4096. Llama index should see that you have max tokens set to 1000 and figure it out.

Weird that the start of the prompt is getting cut off though, I would expect it to be the end?

Systems prompts are a little junky though, still figuring out the best way to integrate them

A simple fix might be to slightly decrease the context window to take into account the system prompt

bbig_ol_tender

I don’t have a prompt helper configured so I’m still confused as to how it’s calculating?

LLogan M

It picks up the data from the llm itself

LLogan M

Prompt helper isn't really user facing anymore

LLogan M

And also you can directly set context_window and num_outputs directly in the service context

LLogan M

Maybe just to be sure lol

bbig_ol_tender

Sorry, can you elaborate on how it is using text-generation-inference to get the number of tokens?

bbig_ol_tender

TGI doesn’t have a tokenizer endpoint, they expect chunking to happen client side

LLogan M

So specifically, for langchain llms, it's defaulting to context_window=3900 (allows for some wiggle room) and num_output=256

If any of these are incorrect, then they need to be adjusted in the service context to match the LLM

Plain Text

llm = LangChainLLM(lc_llm)
service_context = ServiceContext.from_defaults(llm=llm, context_window=3900, num_output=1000)

set_global_service_context(service_context)

For other LLMs officially supported, these numbers can be pulled from the LLM class directly. LangChainLLM is a special case.

Then, using these values, llama-index chunks things appropriately. The one snag is the system prompt, which is not accounted for properly when constructing LLM inputs (long story). So if your system prompt is causing issues, try shortening the context window in the service context

bbig_ol_tender

Right, but without access to the correct tokenizer its impossible to tell how to chunk things appropriately

bbig_ol_tender

Sorry if I’m being dense, but how does it know/access the llama-2 tokenizer to count to 4096 tokens

LLogan M

right, it doesnt, it will be an approximation. By default it's using a gpt2 tokenizer to calculate this.

Most tokenizers are fairly close to eachother though tbh

LLogan M

Hence, setting the lower context_window is usually advised 🙂

LLogan M

(which is also why the default is 3900)

bbig_ol_tender

Ah got it, thanks

Add a reply

Find answers from the community

Nvm nvm it’s not working 😕