Are there any local LLM models that

At a glance

Are there any local LLM models that llama-index can run on? Thinking of using this for work, but the workplace might be skeptical of making API calls to OpenAI. Any Local LLM's that exist and can integrate?

18 comments

jjagadeeshj

wget https://huggingface.co/Pi3141/alpaca-native-7B-ggml/resolve/397e872bf4c83f4c642317a5bf65ce84a105786e/ggml-model-q4_0.bin You can use this with langchain llamacpp

LLogan M

Keep in mind a ton of LLMs that are open source are also non-commercial. It's super annoying haha we've been trying to find some for work too

LLogan M

The best one for commercial use I've tried so far is camel

LLogan M

https://huggingface.co/Writer/camel-5b-hf

JJoie

Thanks! Do you have some example with llamaindex?

LLogan M

You can try this, I tested on a few indexes and it seemed to work decently

This entire setup will run locally... I suggest having a pretty beefy gpu lol

Plain Text

# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20

model_name = "Writer/camel-5b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16
)
PROMPT_TEMPLATE = (
  "Below is an instruction that describes a task. "
  "Write a response that appropriately completes the request.\n\n"
  "### Instruction:\n{instruction}\n\n### Response:"
)

class CustomLLM(LLM):
  model_name = "Writer/camel-5b-hf"

  def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
    prompt = prompt.strip()
    text = PROMPT_TEMPLATE.format(instruction=prompt)
    model_inputs = tokenizer(text, return_tensors="pt", max_length=max_input_size).to("cuda")
    output_ids = model.generate(**model_inputs, max_new_tokens=num_output) #, temperature=0, do_sample=True)
    output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
    clean_output = output_text.split("### Response:")[1].strip()
    return clean_output

  @property
  def _identifying_params(self) -> Mapping[str, Any]:
    return {"name_of_model": self.model_name}

  @property
  def _llm_type(self) -> str:
    return "custom"

llm_predictor = LLMPredictor(llm=CustomLLM())
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap, chunk_size_limit=512)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model, prompt_helper=prompt_helper, chunk_size_limit=512)

JJoie

How do you like GPT4All? Is it useful? I noticed it runs quicker locally but I’d imagine that might sacrifice on quality

LLogan M

Yea the quality is ok-ish. But it's trained on GPT4 outputs which I feel makes it inferior (and is also against TOS for openai lol)

LLogan M

This person on github has implemented gpt4all with llama index, and a few others as well

https://github.com/autratec?tab=repositories

cchanansh

can you please post a self contained example?

MMattPScott

Hey everyone, I keep coming back to this thread to see if there's an update on any newer local LLMs that can handle the queries from LlamaIndex. I've tried a number of models so far, but nothing can even fully index documents successfully. Generally they just start returning the few shot examples in the prompt.

MMattPScott

I do think perhaps my embeddings are messed up tho. I ended up using the API for text-generation-webui, and I wasn't sure how to handle this line:
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

MMattPScott

Even Camel above doesn't really work well.

LLogan M

Yea tbh I still haven't found much better. There was a new model released this week, but I haven't had a chance to try it yet https://huggingface.co/mosaicml/mpt-7b-instruct

Tbh every LLM will probably take some amount of prompt engineering with the prompt templates though

LLogan M

You'll probably have to subclass the base embeddings class, the support for custom embedding apis isn't there yet

MMattPScott

Thanks

MMattPScott

I dig what MPT have done, but I am not in love with the dolly instruct format

LLogan M

I think they also built a chat variant, if that's more your style 👀

Add a reply

Find answers from the community

Are there any local LLM models that