Find answers from the community

Updated 2 years ago

Are there any local LLM models that

Are there any local LLM models that llama-index can run on? Thinking of using this for work, but the workplace might be skeptical of making API calls to OpenAI. Any Local LLM's that exist and can integrate?
2
j
L
J
18 comments
Keep in mind a ton of LLMs that are open source are also non-commercial. It's super annoying haha we've been trying to find some for work too
The best one for commercial use I've tried so far is camel
Thanks! Do you have some example with llamaindex?
You can try this, I tested on a few indexes and it seemed to work decently

This entire setup will run locally... I suggest having a pretty beefy gpu lol

Plain Text
# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20

model_name = "Writer/camel-5b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16
)
PROMPT_TEMPLATE = (
  "Below is an instruction that describes a task. "
  "Write a response that appropriately completes the request.\n\n"
  "### Instruction:\n{instruction}\n\n### Response:"
)

class CustomLLM(LLM):
  model_name = "Writer/camel-5b-hf"

  def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
    prompt = prompt.strip()
    text = PROMPT_TEMPLATE.format(instruction=prompt)
    model_inputs = tokenizer(text, return_tensors="pt", max_length=max_input_size).to("cuda")
    output_ids = model.generate(**model_inputs, max_new_tokens=num_output) #, temperature=0, do_sample=True)
    output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
    clean_output = output_text.split("### Response:")[1].strip()
    return clean_output

  @property
  def _identifying_params(self) -> Mapping[str, Any]:
    return {"name_of_model": self.model_name}

  @property
  def _llm_type(self) -> str:
    return "custom"

llm_predictor = LLMPredictor(llm=CustomLLM())
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap, chunk_size_limit=512)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model, prompt_helper=prompt_helper, chunk_size_limit=512)
How do you like GPT4All? Is it useful? I noticed it runs quicker locally but I’d imagine that might sacrifice on quality
Yea the quality is ok-ish. But it's trained on GPT4 outputs which I feel makes it inferior (and is also against TOS for openai lol)
This person on github has implemented gpt4all with llama index, and a few others as well

https://github.com/autratec?tab=repositories
can you please post a self contained example?
Hey everyone, I keep coming back to this thread to see if there's an update on any newer local LLMs that can handle the queries from LlamaIndex. I've tried a number of models so far, but nothing can even fully index documents successfully. Generally they just start returning the few shot examples in the prompt.
I do think perhaps my embeddings are messed up tho. I ended up using the API for text-generation-webui, and I wasn't sure how to handle this line:
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
Even Camel above doesn't really work well.
Yea tbh I still haven't found much better. There was a new model released this week, but I haven't had a chance to try it yet https://huggingface.co/mosaicml/mpt-7b-instruct


Tbh every LLM will probably take some amount of prompt engineering with the prompt templates though
You'll probably have to subclass the base embeddings class, the support for custom embedding apis isn't there yet
I dig what MPT have done, but I am not in love with the dolly instruct format
I think they also built a chat variant, if that's more your style πŸ‘€
Add a reply
Sign up and join the conversation on Discord