Find answers from the community

N
NER
Offline, last seen 3 months ago
Joined September 25, 2024
N
NER
·

Cpu

Good day folks, I am trying to build a RAG pipeline with quantized Zepher 7B model and a local embedding model; when I load Zepher, I dont have GPU memory available to load the embedding model on my dev setup. I looked up https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html, and through some code, but, could you please help me understand how I can use a local embedding model on CPU as opposed to GPU? Appreiacate any help! Thank you
1 comment
L