Find answers from the community

R
Rob_P
Offline, last seen 3 months ago
Joined September 25, 2024
Hi again, just trying to see if anyone can offer any advise on how to run a local Llama2 llm on one server and query it from another. Like is there an out of the box way to run a llama-cpp-python[server] on one server and change the endpoints in llama-index to like http:// 192.168.0.19:8000 to query it? I see that I can run on the same machine via LlamaCPP the model but I am looking to have a beefy server running just the llm model and not the querying code. Also I am noticing while running Llama2 via llama-cpp-python[server] that I am getting results much faster than when running from LlamaCPP, just wondering if anyone has noticed this as well. Thanks.
2 comments
R
L
Hi all, I am noticing some issues after upgrading from 0.6.26 to latest that I could use some advice on.

When I persist my newly created VectorStoreIndex with a new TokenCountingHandler set on the CallbackManager on the index's ServiceContext, the TokenCountingHandler is missing in the list of handlers on retrieval of the index. Is this expected behaviour?

Also when I use ResponseEvaluator or QueryResponseEvaluator now on the response of the chat engine, I am getting an error about missing the source_nodes property on AgentChatResponse. The response type appears not to be common anymore between the response of a query engine and chat engine. Is there going to be a way to evaluate the response of the chat engine?
5 comments
R
L