Find answers from the community

Updated 3 months ago

hello there, while setting up my rag +

hello there, while setting up my rag + reranker pipeline I noticed it is taking quite awhile to instantiate query engines. Currently it is taking around 20 to 30 secs. Just curious if this is normal or it is because of how the reranker is being created within the query engine?

Plain Text
node_parser = MarkdownElementNodeParser(num_workers=8, show_progress=False)
nodes = node_parser.get_nodes_from_documents([document])
base_nodes, objects = node_parser.get_nodes_and_objects(nodes)
index =  VectorStoreIndex(nodes=base_nodes+objects)
recursive_query_engine = index.as_query_engine(similarity_top_k=3,node_postprocessor[FlagEmbeddingReranker(top_n=2, model=RERANKER_MODEL)], verbose=False)
W
g
J
12 comments
Yes since you are loading model everytime, that is why it is taking time.

You can load the model once and pass the reference of the model to the query_engine then it should speed up the process.
Hmm alright! Unfortunately my index changes each time I execute my pipeline. And I have no other good reason to persist them because my application processes the first document, save the result, and move on to the next document. Different document has its own top k and top n as well
@galvangjx I had something similar. I just used metadata filtering and created new query engines. I can filter what I'm querying on instantly.

https://gist.github.com/inchoate/fb0e6a2300180afc095da8415c625e9e
Hey @JasonV thanks for your sharing. My pipeline index document one at a time instead of “batch” indexing them. The pipeline (start) reads and parse document from azure blob storage -> index -> create query engine -> query -> store result (end of pipeline). This cycle repeats itself depending on how many document it receives during each trigger
I will give this a try, just wondering if you have more examples that you can share?
... thinking ...
In your case, this is the step that's taking 20-30s: recursive_query_engine = index.as_query_engine(similarity_top_k=3,node_postprocessor[FlagEmbeddingReranker(top_n=2, model=RERANKER_MODEL)], verbose=False)?
You can recreate the other part. I don't think the other part take much time. Can your pipeline have the re-ranker model in memory if possible?

See if you can change the top_n value for your re-ranker model during run time. That will help with your changing value case.
I feel like your re-ranker model is the one that is increasing the time.
Can you try once and see it without the re-ranker model. How much time it takes then.

That will give more clarity whether it is the model or not 😅
Sorry for the late update.

Can you try once and see it without the re-ranker model. How much time it takes then.
I created 2 different query engines. One with reranker (took 15 secs to run) and another without (almost instant). So the reranker is definitely increasing the run time.

You can recreate the other part. I don't think the other part take much time. Can your pipeline have the re-ranker model in memory if possible?

See if you can change the top_n value for your re-ranker model during run time. That will help with your changing value case.
I think it might be possible to change top_n conditionally during run time. See the screenshot attached.
Attachment
image.png
Yeah so you can keep the reranker model in memory and not load it again and again. This will help to load up your engine fast.
Add a reply
Sign up and join the conversation on Discord