Curious if anyone tried using Llama.cpp with LlamaIndex so they can easily access quantized models with CPU only setup. Will using CustomLLM do the trick?
Sorry for the late reply. I was able to load with LangChain, but Llama2 is very picky about prompt format. It needs things like [INST]. Do I just use the LlamaIndex Prompt function to create my own system and prompt formats? I know the HuggingFaceLLM functions have both as parameters