Find answers from the community

Home
Members
MoekaChan
M
MoekaChan
Offline, last seen 2 weeks ago
Joined December 5, 2024
Hello, I am using LlamaIndex with Ollama to build a chatbot that leverages our fine-tuned model using RAG and a custom vectorized database. I use bge_onnx for the embedding model and DuckDB for the database. Previously, the setup included a embedding model (~125MB) and a vectorized database (~1GB). In that configuration, the FaithfulnessEvaluator typically completed evaluations in about 2 seconds.

Recently, I switched to a new embedding model version of bge_onnx (~2.2GB) and re-vectorized the database using DuckDB, resulting in a new database size of 1.75GB. After these updates, I've observed that the FaithfulnessEvaluator now takes more than 25 seconds for the first evaluation. However, on subsequent evaluations (2nd, 3rd, etc.), the process takes only about 1 second.

Could you help me understand why the first evaluation is significantly slower after the updates and suggest ways to optimize the evaluation process?
1 comment
V
Happy new year. I recently fine-tune a model. I used Ollama to run it. It shows a welcome message in Ollama terminal.
In my python code, I use
llm = Ollama(model="xxxx", request_timeout=60.0)
chat_engine =index.as_chat_engine(
chat_mode="context",
llm=llm,
memory=memory
)
How to get welcome message generated by the model when it starts?
9 comments
L
M
Hi, I have question about chat store. I save the chat store

{"store": {"chat_history": [{"role": "user", "content": "which company create you?", "additional_kwargs": {}}, {"role": "assistant", "content": "I wasn't created by a specific company, but rather I am a product of Meta AI, a subsidiary of Meta Platforms, Inc.", "additional_kwargs": {}}, {"role": "user", "content": "Repeat the question I asked you", "additional_kwargs": {}}, {"role": "assistant", "content": "You asked: "Which company created you?" \n\n\nLet me know if you have any other questions!", "additional_kwargs": {}}]}, "class_name": "SimpleChatStore"}

What is "additional_kwargs" for? I want to let the chat store contains response time, token info, and source node. How to do it? It is possible to add those data into "addtional_kwargs"?

Currently, I am using

self.memory = ChatMemoryBuffer.from_defaults(
token_limit=3000,
chat_store=self.chat_store,
)

self.chat_engine = self.index.as_chat_engine(
chat_mode="context",
llm=self.cur_lm,
memory=self.memory
)
10 comments
M
L
Hi, I have a few questions about using Ollama with llama_index.

If I am currently chatting with llama3.2 using:
llm = Ollama(model="llama3.2:latest")
and I want to switch to phi, should I do:
llm = Ollama(model="phi")?

If I want to continue the conversation with the previous llama3.2 instance after switching to phi, should I create two separate instances—one for llama3.2 and one for phi?

If I want to start a completely new chat with llama3.2, is it necessary to create a new instance for it?

If I have 5 different conversations (possibly using the same or different models), should I create 5 separate instances to manage them?

Thanks in advance for your help!
11 comments
M
L