Hi there,
I’m using a local embedding model and Azure open AI for the response synthesis. I’m getting response times of about 10s as measured by:
start_time = time.time()
response = query_engine.query(question)
response_time = time.time()-start_time
Is it possible to get execution times of what happens inside this (black for me) box query_engine.query()? I need to know if the 10s are mostly due to azure open AI (and thus I can’t do anything about it) or if they are coming from the local embedding on my machine.
For instance, does OpenAI return the time it took to create a response on its server?