toniuyt

Chat engine

It seems like llama_index shares the char memory across different chat engines created in my application. How can I then use my index and use it to create a chatbot application that can interact with multiple users at the same time?

6 comments

ttoniuyt

How do I get past rate limits errors. I

How do I get past rate limits errors. I have a chat application that uses llama index and I often get this

Plain Text

WARNING:llama_index.llms.openai_utils:Retrying llama_index.llms.openai_utils.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised RateLimitError: Rate limit reached for gpt-3.
5-turbo in organization org-rfaUzt0VkU7EjbD5nJEEz7yh on tokens per min. Limit: 90000 / min. Current: 86359 / min. Contact us through our help center at help.openai.com if you continue to have issues..

Is there a way load balance?

3 comments

ttoniuyt

Python or Typescript

Hello, I'm a bit new to llamaindex and wondering if I should choose the python or typescript version. I'm more familiar with typescript so if they both offer the same functionalities I'd choose it. However I'm not sure if that if that is the case. The python one seems better documented and with more support. Could someone tell me if there are significant differences between the two?

1 comment

ttoniuyt

ChatMemory Buffer

Hello, I tried using llama index open ai chat engine but I am encountering one problem. If I start chatting in a way that the responses become long it seems like the context string that gets passed to the LLM becomes too long and I hit a token limit error.

Plain Text

openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4121 tokens (4070 in the messages, 51 in the functions). Please reduce the length o
f the messages or functions.

Has this happened to anyone else and what could I do to fix it?

5 comments

ttoniuyt

Debug

I am using a vector index as openai chat engine but the problem is that it is very slow. Sometimes it takes more than 15 seconds. My index is less than 15mb large, shouldn't it be very fast for such a small size? I know that the GPT response itself is slow and there's nothing I can really do to speed it up but from my timing it takes at most half the time on average and the rest is from the querying.

24 comments

ttoniuyt

Is llama index gpu accerelated Should I

Is llama-index gpu accerelated? Should I rent a gpu on my VM instance on which I plan to run a llama-index application?

6 comments

ttoniuyt

What is the difference between system

What is the difference between system and context prompt?

17 comments

ttoniuyt

I'm having an issue with llama index

I'm having an issue with llama index response time in context chat mode. I am using a Database reader to read a row of sel which contains a very long text that is not very well formatted as well. It should have been a regular pdf/text document honestly. Does that make the performance a lot worse because it seems like it does? Sometimes I just wait indefinitely for a response. Is there a way to set some time limit for the retrieval and just give me some answer in the chat.

6 comments

Find answers from the community

Chat engine

How do I get past rate limits errors. I

Python or Typescript

ChatMemory Buffer

Debug

Is llama index gpu accerelated Should I

What is the difference between system

I'm having an issue with llama index