NER

Good day folks, I am trying to build a RAG pipeline with quantized Zepher 7B model and a local embedding model; when I load Zepher, I dont have GPU memory available to load the embedding model on my dev setup. I looked up https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html, and through some code, but, could you please help me understand how I can use a local embedding model on CPU as opposed to GPU? Appreiacate any help! Thank you

Find answers from the community

Cpu