llamaindex/vdr-2b-multi-v1
in a mac m2, the inference takes 3 minutes for 10 pngs?from llama_index.embeddings.huggingface import HuggingFaceEmbedding model = HuggingFaceEmbedding( model_name="llamaindex/vdr-2b-multi-v1", device="cpu", # "mps" for mac, "cuda" for nvidia GPUs trust_remote_code=True, cache_folder="cache", )
MPS backend out of memory (MPS allocated: 18.12 GB, other allocations: 928.00 KB, max allowed: 18.13 GB). Tried to allocate 25.00 MB on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).
PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.7
could work