One more question I have after openAI
llm model, which one is the best free and open source model, we can use?
il let you know on this. im trying ti myself
loading the model is pain.
so far the best luck i had is with wizzard vicuna 13b hf if u have the hardware
works well. is slow in responding can take like 20 seconds ish but is decent in quality. Im also using it with llamaindex to use against my own documents and works decently.
Hey @Ichigø Thanks for the response appreciate it, can you please share the notebook or reference for the wizzard vicuna 13b
i wanted to try that as well.
@dev_blockchain i host my stuff on aws sagemaker so cant really share it straightforward but if you just go on huggingface and look for wizard vicuna by theblocke smth like that u should be ablw to see it
mname = "TheBloke/wizard-vicuna-13B-HF"
tokenizer = LlamaTokenizer.from_pretrained(mname)
model = LlamaForCausalLM.from_pretrained(mname, load_in_8bit=True, device_map="auto", torch_dtype=torch.float16)
def format_prompt(prompt: str) -> str:
prompt_template=f"### Human: {prompt} \n### Assistant:"
return prompt_template
class customLLM(LLM):
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
pr = format_prompt(prompt)
generation_config = GenerationConfig(
max_new_tokens=5000,
temperature=0.1,
repetition_penalty=1.0,
)
inputs = tokenizer(pr, padding=False, add_special_tokens=False, return_tensors="pt").to(model.device)
with torch.inference_mode():
tokens = model.generate(**inputs, generation_config=generation_config)
return tokenizer.decode(tokens[0], skip_special_tokens=True)
@property
def _identifying_params(self) -> Mapping[str, Any]:
return {"name_of_model": model}
@property
def _llm_type(self) -> str:
return "custom"
this should get you started
below that is mainly llamaindex stuff and langhchain stuff
Thanks for the help @Ichigø , will keep you updated about my progress on this.
really appreciate it.
No worries been dealing with this for last month
Thanks to @Logan M lol he been dealing with my questions
Hey @Ichigø , I have implemented the above code, can you please check my notebook and can you look in to the errors i am having.
Also what about this notebook is, i am trying to use it as query llm on my custom data.
@Logan M please look if you can also have any thought or reference for same.
add this line somewhere near the top from typing import Optional, List, Mapping
documents = SimpleDirectoryReader('./data').load_data()
index = GPTSimpleVectorIndex.from_documents(documents,service_context=service_context)
index.save_to_disk('index.json')
query_text = "in Backup and Recovery, what Customer Data is ?"
response = index.query(query_text,response_mode="compact",service_context=service_context, similarity_top_k=1)
print(response)
Answer: Customer Data
query_text = "from where we can access files stored on Smallpdf ?"
response = index.query(query_text,response_mode="compact",service_context=service_context, similarity_top_k=1)
print(response)
@Logan M after that can I use the model like this, because for now, I believe it needs more than 15 GB GPU.So, i just wanted to know if this is the next steps or something different
Great, let me work on this, and will update you so thers can also take help.
Perfect! You could also you use langchain to further make it a chatbot with memory
It would be sick if llama index can just have memory feature
Llama index is less focused on chat, and more on data retrieval/answering questions using your data 👀 I think memory is a pretty low priority at the moment, but maybe someday
@Logan M oh yea wanted to ask does langchain memory like save the chat somewhere or is it just stored in ram and flushed when session stop
Because its odd that whenever i use the the open ai version, it always jist tells me “the new context did not provide blah blah so the original anserr stays the same”
It never answers me anything for the first try
Its like it stores the whole conversation
And then i say give the original answer and it gives me
So it must be storing on server side of openai
Are you using GPT-3.5? This is a super common problem with GPT-3.5 and llama index 😦
This response comes from the answer refinement part of llama index. It used to work, but then openai "updated" gpt-3.5
I have a custom refine prompt I can share, if you want. It seemed to help
Depending on your llama index versions, there's two ways at the bottom
from langchain.prompts.chat import (
AIMessagePromptTemplate,
ChatPromptTemplate,
HumanMessagePromptTemplate,
)
from llama_index.prompts.prompts import RefinePrompt
# Refine Prompt
CHAT_REFINE_PROMPT_TMPL_MSGS = [
HumanMessagePromptTemplate.from_template("{query_str}"),
AIMessagePromptTemplate.from_template("{existing_answer}"),
HumanMessagePromptTemplate.from_template(
"I have more context below which can be used "
"(only if needed) to update your previous answer.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context, update the previous answer to better "
"answer my previous query."
"If the previous answer remains the same, repeat it verbatim. "
"Never reference the new context or my previous query directly.",
),
]
CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
...
# v0.6.x
query_engine = index.as_query_engine(..., refine_template=CHAT_REFINE_PROMPT)
# v0.5.x
respone = index.query(..., refine_template=CHAT_REFINE_PROMPT)
thank you so much! ill check this out
its giving me an error now @Logan M
raise ValueError(f"One input key expected got {prompt_input_keys}")
ValueError: One input key expected got ['refine_template', 'input']
i was adding this to langchain
Hey @Ichigø @Logan M , getting this error
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
i was running this command @Ichigø
print(customLLM()._call("Tell me somthing about New York City."))
Hey @Ichigø @Logan M do we also have conversations like anything, so not only query. But if we can have a liitle chat.
You'll want to integrate with langchain for that 👍
Basically the idea is you use llama index as a custom tool for a langchain agent
oh great thanks, one more important question, how i can print the context while i am querying
If you are using the approach in the notebook, you can use a wrapper function instead of a lambda. In the wrapper, you can print anything you want
Hey @Ichigø, the vicuna
is running now, but i think it's handling on request at a time, or i am doing anything wrong ??
Nah that's how it works 😅 need multiple instances of the model to do parallel... which requires a lot of hardware
oh thanks, i thought the issue is with me 😅
Hey @Logan M , i tried to get the context but still i am not able to get it do you have any reference for the same, like when i am passing a query i want to see that from which context i am getting the response.
Check response.source_nodes I think