when you use Huggingface embeddings and download them, how does that work with attention mechanisms? Is there any over-the-wire transaction when I vectorize a document if I use hf?
what would be the pros and cons of, for example, creating a llama index Chroma db set up to query against a 70 page .pdf vs. just sending that .pdf to Claude API and leveraging its large context window?
for sure, I appreciate the feedback I'm just wondering if speed would actually increase if its a new pdf each time, same with runtime, citeability and cost