Rob_P

·

Hi again just trying to see if anyone

Hi again, just trying to see if anyone can offer any advise on how to run a local Llama2 llm on one server and query it from another. Like is there an out of the box way to run a llama-cpp-python[server] on one server and change the endpoints in llama-index to like http:// 192.168.0.19:8000 to query it? I see that I can run on the same machine via LlamaCPP the model but I am looking to have a beefy server running just the llm model and not the querying code. Also I am noticing while running Llama2 via llama-cpp-python[server] that I am getting results much faster than when running from LlamaCPP, just wondering if anyone has noticed this as well. Thanks.

2 comments

R

L

RRob_P

·

Hi all I am noticing some issues after

Hi all, I am noticing some issues after upgrading from 0.6.26 to latest that I could use some advice on.

When I persist my newly created VectorStoreIndex with a new TokenCountingHandler set on the CallbackManager on the index's ServiceContext, the TokenCountingHandler is missing in the list of handlers on retrieval of the index. Is this expected behaviour?

Also when I use ResponseEvaluator or QueryResponseEvaluator now on the response of the chat engine, I am getting an error about missing the source_nodes property on AgentChatResponse. The response type appears not to be common anymore between the response of a query engine and chat engine. Is there going to be a way to evaluate the response of the chat engine?

5 comments

R

L

Find answers from the community

Hi again just trying to see if anyone

Hi all I am noticing some issues after