Find answers from the community

Updated 2 years ago

hey I ve been using the llamaindex

At a glance

The community member is using the llamaindex postprocessor with a huggingface crossencoder to rerank their results, which takes around 3 seconds. However, using the same model via the Hugging Face inference endpoint takes less than 1 second with an Nvidia T4 GPU. The community member would prefer to use the llamaindex version because it includes a score in the rerank (NodeWithScore), and is wondering if there is a way to utilize an available GPU to speed up the llamaindex postprocessing.

In the comments, another community member suggests that if the user has CUDA installed, the llamaindex postprocessor should be using the GPU automatically.

hey I've been using the llamaindex postprocessor with a huggingface crossencoder to rerank my results. i've noticed it usually takes around 3 seconds, whereas if i use the same model via API on a huggingface inference endpoint it takes <1s (with nvidia t4). would love to use llamaindex's version instead though because it includes a score in the rerank as well (NodeWithScore), so was wondering if there's a way to have the postprocessing utilize an available gpu for speed?
L
1 comment
If you have cuda installed, it should be using your GPU automatically πŸ€”
Add a reply
Sign up and join the conversation on Discord