Hi!, im getting 'Rate limit reached for 10KTPM-200RPM' errors when using gpt-4. Should LlamaIndex take into consideration the limit's, and sleep between calls? or thats something i'll need to do at the app side?
I was trying to follow the text2sql example, and noticed that when you ask something, it tried to build a sql, and execute it. (nice!) however i came to read the prompt used, which is this one:
Hi all!.. I've got a qdrant db, where i stored some docs using Langchain. Im trying to load and use them with Llama, without much luck. Both seems to save content in the db in a different way. Llama seems to use the 'text' payload property to store the text, and langchain 'page_content'. How could one query langchain's way of storing the text with llama?
Hi all. It looks like LlamaIndex's create and refine over a vectorindex, is pretty similar as Langchain's load_summarize_chain with chain_type='refine'. Are they really similar or im creazy?
Hi all, new to llamaindex here. Im trying to figure out how to add ‘short memory’’. Like adding the query and response text of the conversation into the next prompt.. Is that possible? I know i would hit the max tokens limit quite fast, but it would be useful anyway,.