Find answers from the community

Home
Members
toniuyt
t
toniuyt
Offline, last seen 3 months ago
Joined September 25, 2024
It seems like llama_index shares the char memory across different chat engines created in my application. How can I then use my index and use it to create a chatbot application that can interact with multiple users at the same time?
6 comments
t
L
How do I get past rate limits errors. I have a chat application that uses llama index and I often get this
Plain Text
WARNING:llama_index.llms.openai_utils:Retrying llama_index.llms.openai_utils.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised RateLimitError: Rate limit reached for gpt-3.
5-turbo in organization org-rfaUzt0VkU7EjbD5nJEEz7yh on tokens per min. Limit: 90000 / min. Current: 86359 / min. Contact us through our help center at help.openai.com if you continue to have issues.. 

Is there a way load balance?
3 comments
t
T
E
Hello, I'm a bit new to llamaindex and wondering if I should choose the python or typescript version. I'm more familiar with typescript so if they both offer the same functionalities I'd choose it. However I'm not sure if that if that is the case. The python one seems better documented and with more support. Could someone tell me if there are significant differences between the two?
1 comment
W
Hello, I tried using llama index open ai chat engine but I am encountering one problem. If I start chatting in a way that the responses become long it seems like the context string that gets passed to the LLM becomes too long and I hit a token limit error.
Plain Text
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4121 tokens (4070 in the messages, 51 in the functions). Please reduce the length o
f the messages or functions.   

Has this happened to anyone else and what could I do to fix it?
5 comments
W
t
l
t
toniuyt
·

Debug

I am using a vector index as openai chat engine but the problem is that it is very slow. Sometimes it takes more than 15 seconds. My index is less than 15mb large, shouldn't it be very fast for such a small size? I know that the GPT response itself is slow and there's nothing I can really do to speed it up but from my timing it takes at most half the time on average and the rest is from the querying.
24 comments
t
W
Is llama-index gpu accerelated? Should I rent a gpu on my VM instance on which I plan to run a llama-index application?
6 comments
t
L
What is the difference between system and context prompt?
17 comments
t
L
I'm having an issue with llama index response time in context chat mode. I am using a Database reader to read a row of sel which contains a very long text that is not very well formatted as well. It should have been a regular pdf/text document honestly. Does that make the performance a lot worse because it seems like it does? Sometimes I just wait indefinitely for a response. Is there a way to set some time limit for the retrieval and just give me some answer in the chat.
6 comments
L
t