How to Deploy a Reranker on a GPU and Call That as a Se...

At a glance

The community members are discussing how to deploy a reranker on a GPU and call it as a service. They mention that the text-embeddings-inference library supports rerankers, and provide some code examples on how to use it with llama-index. However, they encounter some issues with input validation errors and discuss potential solutions, such as setting an auto-truncate option. There is no explicitly marked answer in the comments.

Useful resources

BBhavya Giri

how to deploy a reranker on a gpu and call that as a service?

14 comments

LLogan M

pretty sure text-embeddings-inference supports rerankers now
https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#sequence-classification-and-re-ranking

BBhavya Giri

how do i call this from llamaindex

LLogan M

https://llamahub.ai/l/postprocessor/llama-index-postprocessor-tei-rerank?from=postprocessor

BBhavya Giri

2025-02-06 15:03:34.834 | ERROR | core.inference_retriver:query_index:338 - Error in query_index UUID: None token: None - 1 validation error for TextEmbeddingInference
top_n
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='BAAI/bge-reranker-large', input_type=str]
For further information visit https://errors.pydantic.dev/2.9/v/int_parsing

BBhavya Giri

@Logan M ?

LLogan M

Typo

LLogan M

Plain Text

>>> from llama_index.postprocessor.tei_rerank import TextEmbeddingInference as TEIR      
>>> from llama_index.core.schema import TextNode, NodeWithScore
>>> nodes = [NodeWithScore(score=1.0, node=TextNode(text="dog")), NodeWithScore(score=1.0, node=TextNode(text="cat")), NodeWithScore(score=1.0, node=TextNode(text="cow"))]
>>> reranker = TEIR(top_n=2, base_url="http://127.0.0.1:8081")
>>> reranker.postprocess_nodes(nodes, query_str="dog dog")[0].text
'dog'

BBhavya Giri

 try:
            logger.debug("Setting up rerranker")
            logger.debug(config.RERANKER_BASE_URL)
            reranker = TEIR(top_n=10,base_url=config.RERANKER_BASE_URL)
        except Exception as e:
            logger.error(f"Reranker error:{e}")
        query_engine = CitationQueryEngine.from_args(
            index,
            node_postprocessors=[reranker],
            similarity_top_k=similarity_top_k,
            llm=llm,
        )

BBhavya Giri

something is wrong i tried:

BBhavya Giri

i am using version 0.2.1 for teir

BBhavya Giri

kens. Given: 694
2025-02-07T05:52:55.236885Z ERROR rerank:predict{truncate=false truncation_direction=Right raw_scores=false}: text_embeddings_core::infer: core/src/infer.rs:398: Input validation error: inputs must have less than 512 tokens. Given: 866
2025-02-07T07:04:30.293402Z ERROR rerank:predict{truncate=false truncation_direction=Right raw_scores=false}: text_embeddings_core::infer: core/src/infer.rs:398: Input validation error: inputs must have less than 512 tokens. Given: 694
2025-02-07T07:04:30.303219Z ERROR rerank:predict{truncate=false truncation_direction=Right raw_scores=false}: text_embeddings_core::infer: core/src/infer.rs:398: Input validation error: inputs must have less than 512 tokens. Given: 866
2025-02-07T07:04:30.306802Z ERROR rerank:predict{truncate=false truncation_direction=Right raw_scores=false}: text_embeddings_core::infer: core/src/infer.rs:398: Input validation error: inputs must have less than 512 tokens. Given: 866
2025-02-07T07:08:11.878829Z ERROR rerank:predict{truncate=false truncation_direction=Right raw_scores=false}: text_embeddings_core::infer: core/src/infer.rs:398: Input validation error: inputs must have less than 512 tokens. Given: 866
2025-02-07T07:08:11.891838Z ERROR rerank:predict{truncate=false truncation_direction=Right raw_scores=false}: text_embeddings_core::infer: core/src/infer.rs:398: Input validation error: inputs must have less than 512 tokens. Given: 694
2025-02-07T07:08:11.892099Z ERROR rerank:predict{truncate=false truncation_direction=Right raw_scores=false}: text_embeddings_core::infer: core/src/infer.rs:398: Input validation error: inputs must have less than 512 tokens. Given: 866

i guess this the error

BBhavya Giri

fixed thanks tho

LLogan M

I think when you launch TEI there's an option to set auto-truncate

LLogan M

--auto-truncate

Add a reply

Find answers from the community

How to Deploy a Reranker on a GPU and Call That as a Service?