Find answers from the community

Updated 5 months ago

Azure OpenAI

At a glance

Hi guys I am using Azure Open Ai for document processing my document is quite big. So, to reduce the processing I am trying to use asyncio. I want to know this method will work or not. Are there any other methods out there? thanks

37 comments

iisaackogan

Hey, I'm also using Azure OpenAI. There is asyncio support for many parts of the library but a couple other parts there are not

iisaackogan

You will have to go through and find the parts of the library that are sync code. Namely, anything that calls run_async_tasks

ttheOldPhilosopher

hi I am using chatcompletion

iisaackogan

I'm using the chatbot feature. If my company lets me opensource what I've done (50/50), I'll share that code. What I can say is you should wrap the sync parts of code with threading

ttheOldPhilosopher

I am sending my context and instruction using it. and I have to send lot of context.

iisaackogan

Yep

iisaackogan

That's where LlamaIndex is super important to reduce tokens

iisaackogan

In terms of making things async, you should wrap the sync calls in to_thread (py 3.9+) from asyncio. It will run it in a different thread, preventing your main loop from being blocked, speeding up the program through concurrency

ttheOldPhilosopher

So I have my own prompt But I am doing some analysis of awhole document which is quite bit so I am sending it part by part and then merge the analysis. It's very inefficient method. Can I decrese it?

ttheOldPhilosopher

How I can use LlamaIndex to reduce tokens? Can you give me a hint?

iisaackogan

So, LlamaIndex basically works by taking your document and splitting it up into chunks, which they call nodes. You can choose between 'stores' which are different ways of organizing and using these chunks.

iisaackogan

The popular one is vector storage. Each chunk is turned into embeddings (vector representation), and then when you actually want to analyze a document

iisaackogan

It takes the keywords from your prompt, and calculates the distance (many dimensions) from your prompt and the chunks

ttheOldPhilosopher

But for my analysis I will have to process every node.

iisaackogan

If you need to process every single node, LlamaIndex likely won't be the tool for you (Logan may know more). It excels specifically because it only pulls the nodes relevant to your query, not the entire context.

iisaackogan

This is to reduce the tokens.

iisaackogan

If you are saying you need to put every single character into the LLM, tokens are proportional to characters, so you will not be able to reduce that.

ttheOldPhilosopher

Yes, that's the thing I want to use every context. So, I have to send it in parts. then do the analysis again and it takes minutes for that.

iisaackogan

Right right, so then LlamaIndex might not be the tool

ttheOldPhilosopher

Yes, I don't want to reduce tokens I just want to send multiple requests so, that it takes less time.

iisaackogan

ahhh

iisaackogan

Okay

iisaackogan

Asyncio is the tool for you

ttheOldPhilosopher

Yes, but I don't know how can I use it I am quite new to Asyncio.

ttheOldPhilosopher

If you can guide me it will be great help.

iisaackogan

Plain Text

import asyncio

results = {}
chunks = ["This is a", "sentence split", "into chunks"]

async def call_to_llm(idx, chunk):
   result = ... code to call LLM
   results[idx] = result

loop = asyncio.get_event_loop()
for idx, chunk in enumerate(chunks):
    loop.create_task(call_to_llm(idx, chunk))

iisaackogan

Something like that is a quick pseudocode to accomplish what you are suggesting

iisaackogan

When it comes to the asyncio part of things

iisaackogan

You will want to grab the loop, via asyncio.get_event_loop()

iisaackogan

You can then use the loop to create asynchronous tasks

iisaackogan

These tasks can run in parallel, which will speed up your calls the way you are hoping for

iisaackogan

Tasks can be launched with loop.create_task(your_async_method(your_param)))

ttheOldPhilosopher

Thanks for the help let me try.

iisaackogan

For sure; this video is a nice quick onboarding. If you continue with Python you will need tasks and asyncio pretty much always:
https://www.youtube.com/watch?v=0GVLtTnebNA

ttheOldPhilosopher

I will watch it

ttheOldPhilosopher

Hi, @isaackogan I wrote the code but I am getting Timeout error when I am processing it on original text:

ttheOldPhilosopher

Kindly, can you check it thanks

Add a reply