Find answers from the community

Updated 3 months ago

Llama cpp

Curious if anyone tried using Llama.cpp with LlamaIndex so they can easily access quantized models with CPU only setup. Will using CustomLLM do the trick?
L
c
4 comments
It's on our todo list to add native support for llama cpp BUT

You can instantiate the llama cpp class from langchain and wrap it in our langchain wrapper

Plain Text
from llama_index.llms import LangChainLLM

llm = LangChainLLM(<lc_llm>)
Oh great workaround!
Sorry for the late reply. I was able to load with LangChain, but Llama2 is very picky about prompt format. It needs things like [INST]. Do I just use the LlamaIndex Prompt function to create my own system and prompt formats? I know the HuggingFaceLLM functions have both as parameters
Tbh it might be easiest to wrap llama cpp with the custom llm class

Then you can ensure every prompt is formatted the way you want?

You can customize both the completion and chat endpoints. (If you don't implement chat(), then any call to chat will call complete under the hood)

https://gpt-index.readthedocs.io/en/stable/core_modules/model_modules/llms/usage_custom.html#example-using-a-custom-llm-model-advanced

We do have utils functions too that you could import and use
https://github.com/jerryjliu/llama_index/blob/main/llama_index/llms/llama_utils.py
Add a reply
Sign up and join the conversation on Discord