Hello friends. On my project, the

At a glance

Hello friends. On my project, the version of llamaindex =
0.8.66. And the processing of one request takes about 26 seconds. Is it faster on the new version of llamain? Just 26 seconds is too long...

26 comments

WWhiteFang_Jr

Are you using open-source LLM?

AAnoDy

Yeah. I just install it from pip llama-index

WWhiteFang_Jr

Its not the architecture actually, its actually how long the llm is taking to generate the final answer.

AAnoDy

I tried the last version - the speed is the same...

AAnoDy

Are there any ways to speed it up?

WWhiteFang_Jr

More powerful GPU may help, as I mentioned it is the llm that is taking the time to generate the response.

You can try streaming, that way it will start giving away the response right away. each token at a time and it will look the response are coming very fast 😅

AAnoDy

I got it)) thx

AAnoDy

does LlamaParse work faster? (I test it on m2 processor)

AAnoDy

Another question is whether llamaIndex can work with PyTortch instead of OpenAI. So that the speed is based on my machine, not on waiting for a response from OpenAI

WWhiteFang_Jr

LlamaParse is interacting with LlamaCloud so nothing happens on your machine actually.

You give the file, it sends the file to llamacloud , gets processed and the final response is returned.

There is async way for multiple files cases, but the process remains the same.

Yeah you can use custom LLM class and wrap it around your llm and use it.

Are you using OpenAI and still getting resposne in 26 seconds?

AAnoDy

Yes, I use GPT 4 turbo. And it takes about 20-30 seconds.

AAnoDy

Attachment

WWhiteFang_Jr

I think gpt -4 is slow in comparison to gpt 3.5.

Try testing it directly
print(llm.complete("big Bang theory song starts like ") )

This way you can check how much time it takes to generate the final answer.

AAnoDy

10 sec.

Attachment

WWhiteFang_Jr

Yeah so, if you add up the response generation based on the number of nodes passed to the llm , it adds up

AAnoDy

Well, yes, I have an average of 2 nodes. And it takes 20-30 seconds.
But my client is not happy because there is "flowiseai" - which also works with llamaindex/langchain and the speed is many times better. The difference is 10-16 seconds. That is, on average, it works in 6-9 seconds.

Although it also uses llamaindex, I cannot understand how this is possible... The database is the same.

WWhiteFang_Jr

Try with GPT3.5 once

AAnoDy

sure. 5 sec

AAnoDy

lol, llm.complete: 1.612645149230957

AAnoDy

wtf 😄

WWhiteFang_Jr

Haha as I said, GPT-4 is much better but slow in comparison to generate 😆

AAnoDy

flowiseai - also uses gpt 4 =((( And it's faster 😮‍💨