By default qdrant hybrid is using a local model to generate sparse embeddings
If you don't have a GPU, this will be pretty slow
You can customize the function that generates sparse embeddings if you have a better option
ah, I do have a 2080ti that I'm using for my normal embed model
the actual Generating embeddings
part with my gpu takes around 15 sec like in the post above. So is it generating the sparse embeddings after the normal embeddings?
and that's just not being shown?
Yea the sparse embeddings get generated after, once vector_store.add() is called
ah, is there a way to display that progress? and you said above I should look into customizing the function that generates sparse embeddings? If so, does hybrid not use gpu by default?
I'm actually not sure if fastembed supports gpu or not
But you can customize how this function is running
I don't think there's a progress bar here
I'm using huggingface for my default embed model
Yea, this is completely unrelated/separate
ah okay, I'll look into doing sparse indexing a different way to speed it up
Also, I did see a fastembed-gpu version
I'll see if I install that manually, will it use that instead of fastembed
testing now, will report back when I see results
well, actually, is parsing nodes
using fastembed? or any embedding model?
Mmm parsing nodes is just splitting text into chunks
damn, I just looked and Parsing nodes
hammers a single core and nothing else
Yea it's just string operations
gotta get that multi threaded
It's just hitting that single core really hard
I wonder if thats causing the bottle neck, lack of multi threaded processing for these events
Finished the first embed, that spike in my gpu, and now it's doing sparse nodes, I'm guessing from what you said, which doesn't seem to be running on gpu still
it's causing cpu speed to fluctuate which means it's processing slower right?
I'm gunna cancel that and do it again, this time, no hybrid, just straight vector store and see how that goes
yah, the sparse node creation is extremely time consuming
Gotta find a way to do it faster
would it be possible to put, if the gpu is available, sparse embedding on the gpu too?
because fastembed-gpu
is a package that uses the gpu
Actually, reading up a little bit, on L23, it seems like there are two options for generating sparse embeds, default_sparse_encoder
and fastembed_sparse_encoder
is there a way to choose which we use?
or print which is being used?
Looking at the class SparseTextEmbedding
on L46, you could pull this in to your current code and then change to fastembed-gpu
as the default install rather than fastembed
. This might help with the speed at which these sparse nodes are generated @Logan M
Using GPT to spit ball code idea:
def fastembed_sparse_encoder(
model_name: str = "prithvida/Splade_PP_en_v1",
batch_size: int = 256,
cache_dir: Optional[str] = None,
threads: Optional[int] = None,
device: Optional[str] = None,
) -> SparseEncoderCallable:
try:
from fastembed.sparse.sparse_text_embedding import SparseTextEmbedding
from fastembed.common import OnnxProvider
import torch
except ImportError as e:
raise ImportError(
"Could not import FastEmbed. "
"Please install it with `pip install fastembed`"
) from e
if device is None:
device = "cuda" if torch.cuda.is_available() else "cpu"
providers = [OnnxProvider.CUDAExecutionProvider] if device == "cuda" else [OnnxProvider.CPUExecutionProvider]
model = SparseTextEmbedding(model_name, cache_dir=cache_dir, threads=threads, providers=providers)
def compute_vectors(texts: List[str]) -> BatchSparseEncoding:
embeddings = model.embed(texts, batch_size=batch_size)
indices, values = zip(
*[
(embedding.indices.tolist(), embedding.values.tolist())
for embedding in embeddings
]
)
return list(indices), list(values)
return compute_vectors