Find answers from the community

Updated 5 months ago

How do you switch LLM's with the create-

At a glance

The community members are discussing how to switch LLMs (Large Language Models) with the create-llama stack on LlamaIndex and how to fix the max token problem. They suggest modifying the LLM in the service context, which may be in the llamaindex-streaming.ts file. One community member provides some code related to creating a parser and stream transformer. Another community member suggests changing the model name in the constants.ts file or modifying a specific line of code. The community members also discuss whether the chat history resets after each run or refresh, and how to automatically reduce the size of the input context to avoid the max token error.

Useful resources
How do you switch LLM's with the create-llama stack on llama index or fix this max token problem
L
C
10 comments
which backend did you create with create-llama ?

You just have to modify the LLM in the service context usually
I used NEXT JS
I need to modify the service context?
Is that in the llamaindex-streaming.ts file
import {
createCallbacksTransformer,
createStreamDataTransformer,
trimStartOfStreamHelper,
type AIStreamCallbacksAndOptions,
} from "ai";

function createParser(res: AsyncGenerator<any>) {
const trimStartOfStream = trimStartOfStreamHelper();
return new ReadableStream<string>({
async pull(controller): Promise<void> {
const { value, done } = await res.next();
if (done) {
controller.close();
return;
}

const text = trimStartOfStream(value ?? "");
if (text) {
controller.enqueue(text);
}
},
});
}

export function LlamaIndexStream(
res: AsyncGenerator<any>,
callbacks?: AIStreamCallbacksAndOptions,
): ReadableStream {
return createParser(res)
.pipeThrough(createCallbacksTransformer(callbacks))
.pipeThrough(
createStreamDataTransformer(callbacks?.experimental_streamData),
);
}
Thanks brother I found it
Just one more question. Does the chat history reset with this app after every run or refresh?
Or do you know a way to automatically reduce the size of the input context so that you dont always get this error of "This model's maximum context length is 8192 tokens. However, your messages resulted in 8209 tokens. Please reduce the length of the messages."
I thiiiiink every refresh should reset it.

I'm waaay less familiar with the TS library, so I'm not immediately sure how to fix that. Something about limiting the chat memory somewhere I'm guessing
Add a reply
Sign up and join the conversation on Discord