I’m trying to set up an ReAct agent as a chatbot since it’s capable of using multiple tools. And by following the documentation I’m having trouble setting up context for the agent.
I want the agent to acknowledge that it’s a privacy management assistant, but no matter the prompt, the agent will give me some generic self introduction
Hi guys! Got a question on using GPU to accelerate inference, the environment should be all set, I have CUDA and Cublas set up for llama-cpp-python. Then I run the following code for LLM
Hey guys I feel stupid posing this but after quantizing the model like this llm = HuggingFaceLLM( model_name="HuggingFaceH4/zephyr-7b-alpha", tokenizer_name="HuggingFaceH4/zephyr-7b-alpha", query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"), context_window=3900, max_new_tokens=256, model_kwargs={"quantization_config": quantization_config}, # tokenizer_kwargs={}, generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95}, messages_to_prompt=messages_to_prompt, device_map="auto", ) how are we supposed to store the quantized model locally
Agent support for Sub Question Query Engine? It seems that agent class can only take a set of engine tools, but not sub question query engine, is it not supported?
Hi People! I was wondering if anyone has tried finetuning their own model to distill GPT4? I followed the documentation to finetune it with the finetune engine ft_llm = finetune_engine.get_finetuned_model(temperature=0.3). Now this may sound very silly but I have no idea how to keep the fine tuned LLM? or how do I keep the fine tuned ReAct agent. I went through the documentation multiple times and could not find how to do it. Appreciate any help!