Find answers from the community

Updated 9 months ago

Hello friends. On my project, the

Hello friends. On my project, the version of llamaindex =
0.8.66. And the processing of one request takes about 26 seconds. Is it faster on the new version of llamain? Just 26 seconds is too long...
W
A
26 comments
Are you using open-source LLM?
Yeah. I just install it from pip llama-index
Its not the architecture actually, its actually how long the llm is taking to generate the final answer.
I tried the last version - the speed is the same...
Are there any ways to speed it up?
More powerful GPU may help, as I mentioned it is the llm that is taking the time to generate the response.

You can try streaming, that way it will start giving away the response right away. each token at a time and it will look the response are coming very fast ๐Ÿ˜…
I got it)) thx
does LlamaParse work faster? (I test it on m2 processor)
Another question is whether llamaIndex can work with PyTortch instead of OpenAI. So that the speed is based on my machine, not on waiting for a response from OpenAI
  1. LlamaParse is interacting with LlamaCloud so nothing happens on your machine actually.
You give the file, it sends the file to llamacloud , gets processed and the final response is returned.

There is async way for multiple files cases, but the process remains the same.

  1. Yeah you can use custom LLM class and wrap it around your llm and use it.
  1. Are you using OpenAI and still getting resposne in 26 seconds?
Yes, I use GPT 4 turbo. And it takes about 20-30 seconds.
I think gpt -4 is slow in comparison to gpt 3.5.

Try testing it directly
print(llm.complete("big Bang theory song starts like ") )

This way you can check how much time it takes to generate the final answer.
10 sec.
Attachment
image.png
Yeah so, if you add up the response generation based on the number of nodes passed to the llm , it adds up
Well, yes, I have an average of 2 nodes. And it takes 20-30 seconds.
But my client is not happy because there is "flowiseai" - which also works with llamaindex/langchain and the speed is many times better. The difference is 10-16 seconds. That is, on average, it works in 6-9 seconds.

Although it also uses llamaindex, I cannot understand how this is possible... The database is the same.
Try with GPT3.5 once
lol, llm.complete: 1.612645149230957
wtf ๐Ÿ˜„
Haha as I said, GPT-4 is much better but slow in comparison to generate ๐Ÿ˜†
flowiseai - also uses gpt 4 =((( And it's faster ๐Ÿ˜ฎโ€๐Ÿ’จ
It could be they are using a finetuned one. Those are specific llms used by the finetuner party only. So that could be the reason for speed
Yeah, make sense
BTW - bot kapa.ai work on 3.5 gpt?
Not much idea on this, its a third party integration you can find more here: https://www.kapa.ai/
Add a reply
Sign up and join the conversation on Discord