Hey everyone, I'm In a bit of a trouble right now and hope you can help me. I'm running LlamaIndex with Qdran on hybrid mode which requires uploading Dense and Sparse vectors, according the docs LlamaIndex will generate the sparse vectors locally. as a result the upload Is very slow. What are the solutionS we currently have to load our data efficiently ? is there any model that is preferred over others ? Also, if anyone can share any notebook with some related code, we would really appreciate that. Thanks a lot guys !
generating sparse vectors is definitely a bottleneck. We deployed our own model on huggingface inference api, and then provided the customatization hooks to call that when generating sparse vectors
Basically if you can run the sparse embeddings on GPU (either locally or across an api), thats the way to do it