Converting RAG app to async for production

At a glance

Hi everyone. I'm working on taking my RAG app to production and want to convert it to async. My pipeline is two query engine tools called by a RouterQueryEngine.
One of the tools is a summary_index.as_query_engine and I'm using use_async=True with in it. The other tool is a QueryEngineTool with query_engine=vector_query_engine and vector_query_engine=RetrieverQueryEngine with a VectorIndexRetriever.
Both query engines include a node_postprocessor that uses CohereRerank.

I'm unsure if the QueryEngineTool part will support async, and I'm also not sure if the RouterQueryEngine also needs async specially associated with it, or if only the tools within it need to be async. It's really difficult to get clear async instructions for all of the different functions and query engines.

Any tips or info would be appreciated!

6 comments

SSeaBerg

I fed the llamaindex files into Claude Sonnet and got some support, though not sure if it's fully accurate.

VectorIndexRetriever:
The aretrieve method is actually part of the VectorIndexRetriever class, which is defined in the retriever.py file you provided.
Usage in RetrieverQueryEngine:
The RetrieverQueryEngine class (which uses the VectorIndexRetriever) has both _retrieve and _aretrieve methods. The asynchronous version (_aretrieve) would be called when using the async query methods of the RetrieverQueryEngine.
RouterQueryEngine:
The RouterQueryEngine is what's actually used in your app.py. This class has both synchronous (_query) and asynchronous (_aquery) methods.

Given this clarification, here's how the asynchronous flow would actually work in your application:

In app.py, you would use the aquery method of the RouterQueryEngine instead of query.
This would internally call the _aquery method of RouterQueryEngine.
The _aquery method would then use the async versions of its components, including the aquery method of any QueryEngineTool it's using.
If one of these tools is using a RetrieverQueryEngine, it would then use the _aretrieve method, which in turn would call the aretrieve method of the VectorIndexRetriever.

So you don't need to explicitly change anything related to VectorIndexRetriever in your app.py. Instead, you need to focus on using the async version of the RouterQueryEngine.

LLogan M

any query engine will have async becasue you can use await query_engine.aquery("query") (this assumes you are using an llm and embedding model that supports async, most API-based ones do though)

not every vector store will support async (some might be fake async)

SSeaBerg

Thanks. My number of nodes is quite small so I'm actually using the default in memory llamaindex vector store.

SSeaBerg

I plugged in the node postprocessors and coherent python files, and looks like the Cohere Rerank via node postprocessors is synchronous only. Not sure if this is really that critical as it won't block the pipeline very long. Claude did provide a custom Cohere Rerank async node postprocessor class for me to try.

LLogan M

Yea we need to add async methods to the postprocessors 😅 I think any custom implementation probably won't work, unless you are manually doing the retrieve -> rerank -> synthesis steps

LLogan M

Otherwise, the query engine will always call the sync postprocess_nodes method

Add a reply

Find answers from the community

Converting RAG app to async for production