Find answers from the community

Updated last year

Quantized LLama2

Hello Experts,
How to use llama_index with quantized llama2 models?
E
L
5 comments
Almost the same thing, but then you would use a Hugging Face quantized model

https://huggingface.co/TheBloke/Llama-2-7B-GGML
maybe @Logan M can validate this
Yea, you'll want to use llama cpp for ggml or gguf files

https://gpt-index.readthedocs.io/en/stable/examples/llm/llama_2_llama_cpp.html

Huggingface also supports normal quantization using bitsandbytes or gptq
Add a reply
Sign up and join the conversation on Discord