There's also a few other approaches
Like
documents = SimpleDirectoryReader("./data").load_data()
index = ListIndex.from_documents(documents)
response = index.as_query_engine(response_mode="tree_summarize").query("Summarize this text")
For a chat though, leaving the response mode to be default might work better too, not sure
the example is working fine for a small document. however, I encounter this error with a big one
WARNING:llama_index.llms.openai_utils:Retrying llama_index.llms.openai_utils.acompletion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: Error communicating with OpenAI.
@Logan M does the example you gave use OpenAI also?
Yea it uses openai
That's a bit of an odd error, I'm going to assume openai is just having some server issues?
not sure. maybe it's a rate limit issue i dunno
do you mind send the full code for the example you gave?
i tried running it without OpenAI and it downloaded llama
Yea that's the local fallback
I believe you that you have it set up right. Normally if it's a rate limit issue it well say RateLimitError 🤔
Are you just using openai free credits right now or nah?
i just run the entire code you gave
import os
import logging
import sys
from llama_index import (
SimpleDirectoryReader,
ListIndex,
)
documents = SimpleDirectoryReader("./data").load_data()
index = ListIndex.from_documents(documents)
response = index.as_query_engine(response_mode="tree_summarize").query("Summarize this text")
Yea, like for me that always works 🤔 maybe there's something weird about the text you are reading/sending?
maybe, could it be the emojis 😄
i am encounting this error with the example you gave, any ideas on what is the issue? it is using a local llama model
**
llama_print_timings: load time = 92952.89 ms
llama_print_timings: sample time = 254.52 ms / 252 runs ( 1.01 ms per token, 990.10 tokens per second)
llama_print_timings: prompt eval time = 224784.60 ms / 1031 tokens ( 218.03 ms per token, 4.59 tokens per second)
llama_print_timings: eval time = 118937.99 ms / 251 runs ( 473.86 ms per token, 2.11 tokens per second)
llama_print_timings: total time = 345072.56 ms
Exception ignored in: <function Llama.__del__ at 0x124d45d00>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/llama_cpp/llama.py", line 1558, in __del__
TypeError: 'NoneType' object is not callable
looks like a max token issue with llama
i was able to get a result but still have this error
Spanish rule ended in 1898 with Spain's defeat in the Spanish–American
Exception ignored in: <function Llama.__del__ at 0x11e884fe0>
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/llama_cpp/llama.py", line 1558, in __del__
TypeError: 'NoneType' object is not callable
isn't the chunking supported for llama?
i didn't encounter the max token issue using OpenAI
tried this but it didnt work
service_context = ServiceContext.from_defaults(llm='local', chunk_size_limit=3000)
Is this really an error? I think this is getting printed once your script has finished running.
I've seen this too, just a harmless llamacpp bug when the program is shutting down