Have you tried
It might give some more details if it's related to API issues
I was having tons of api issues yesterday, the api was down / flakey
Oh good point, will try that!
And that'll work for llama index calls as well?
I'm still having issues today where llama index queries just won't return
I see that the openai library increased their timeout to 10 mins, with tons of retries I can see how that 10 mins would compound to hours
hmmm thats pretty brutal. We should lower that and make it configurable
It's been working for me so far
Yea it will work for any call made using the openai client
If it ends up not being rate limit issues and etc, can I show u my implementation to see if u can catch what I'm doing wrong? It was working perfectly at 0.8.62 and when I went to 0.8.65 with no change in the code it just times out
yea for sure π€ Like the change between those two versions is this new openai client really
does llm.complete("Hello!")
work? Would be a quick sanity test
@Logan M My implementation looks like this
llm = ChatOpenAI(model=model, temperature=0)
memory = ConversationSummaryBufferMemory(
memory_key="memory",
return_messages=True,
llm=llm,
max_token_limit=100000 if "preview" in model else max_token_limit,
)
agent_kwargs = {
"extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
"system_message": SystemMessage(
content="You are a superpowered version of GPT that is able to answer questions about the data you're "
"connected to. Each different tool you have represents a different dataset to interact with. "
"If you are asked to perform a task that spreads across multiple datasets, use multiple tools "
"for the same prompt. When the user types links in chat, you will have already been connected "
"to the data at the link by the time you respond. When using tools, the input should be "
"clearly created based on the request of the user. For example, if a user uploads an invoice "
"and asks how many usage hours of X was present in the invoice, a good query is 'X hours'. "
"Avoid using single word queries unless the request is very simple. You can query multiple times to break down complex requests and retrieve more information."
),
}
agent_chain = initialize_agent(
tools=tools,
llm=llm,
agent=AgentType.OPENAI_FUNCTIONS,
verbose=True,
agent_kwargs=agent_kwargs,
memory=memory,
handle_parsing_errors="Check your output and make sure it conforms!",
)
My index tools are defined like
tool_config = IndexToolConfig(
query_engine=engine,
name=f"{filename}-index",
description=f"Use this tool if the query seems related to this summary: {summary}",
tool_kwargs={
"return_direct": False,
},
max_iterations=5,
)
tool = LlamaIndexTool.from_tool_config(tool_config)
Where engine is defined as
retriever = VectorIndexRetriever(
index=index, similarity_top_k=2, service_context=service_context
)
response_synthesizer = get_response_synthesizer(
response_mode=ResponseMode.COMPACT_ACCUMULATE,
use_async=True,
refine_template=TEXT_QA_SYSTEM_PROMPT,
service_context=service_context,
verbose=True,
)
engine = RetrieverQueryEngine(
retriever=retriever, response_synthesizer=response_synthesizer
)
Do you see anything inherently wrong here?
what does your service context look like?
service_context = ServiceContext.from_defaults(
embed_model=embedding_model,
callback_manager=callback_manager,
node_parser=node_parser,
)
wait.. I just realized I don't have the llm in my service context
when I dont have the llm= param in there
That shouldn't TECHNICALLY be an issue, it defaults to
from llama_index.llms import OpenAI
llm = OpenAI(temeprature=0.1, model="gpt-3.5-turbo")
you can try service_context.llm.complete("Hello!")
to check if it works
The issue is it works sometimes sometimes doesn't
sometimes the querying of the index will be perfect sometimes it'll just stall forever
so sometimes llm.complete works sometimes stalls
ok I think this is just issues with OpenAI's API these last few days, combined with the fact that the retry mechanism settings are hot garbage right now
You should be able to do something like
llm = OpenAI(temeprature=0.1, model="gpt-3.5-turbo", max_retries=2, additional_kwargs={"timeout": 30})
to try and improve
ok I can't remeber if the timeout works under additional_kwargs or the kwarg directly lol one or the other should help
I'm still defining my LLM as llm = ChatOpenAI(model=model, temperature=0)
Technically no, we are compatible with that. But you'll have to changes the retries and timeout from there (however it works for langchain, maybe the IDE autocomplete hints can help lol)
or you could start by at least passing that LLM into the service context
Passing the LLM seems to have helped a lot, gpt-3.5 turbo seems even more unstable compared to what it should've been using (gpt-4-32k)
timeout seems to be like
timeout = httpx.Timeout(10.0, read=5.0, write=10.0, connect=2.0)
service_context = ServiceContext.from_defaults(
llm=OpenAI(max_retries=2, additional_kwargs={"timeout": timeout}),
embed_model=embedding_model,
callback_manager=callback_manager,
node_parser=node_parser,
)
nvm that doesn't work bc langchain's broken
and passing it into llamaindex doesn't work because it's wrapped by langchain
gonna have to wait for langchain to fix their timeouts i think
@Logan M I am running into a weird condition
when I index a PPTX and then query over that PPTX, the query doesn't return
other file types return normally now..
@Logan M can u try with this when u get some time and see if u can get a summary of the PPTX from llamaindex?
lol ok lets try it right now
Failed to load file Capitalism_vs_Communism.ppt with error: File is not a zip file. Skipping...
seems like the pptx
package is failing to open/read the file
but llama_index isn't popping up an error, no?
My instance just seems to silently stall and not throw any sort of error when that loader is used
hmm weird. Yea like I installed fresh in colab there and it errored out pretty quick trying to open the file
Are you using simple directory reader? Maybe it didn't actually load any documents?
That one failed a little more silently
(well, it printed an issue, but kept trucking)
yeah I am using simple directory reader, and I think I confirmed that there is a file being fed to it // the file path is correct
My implementation is basically
def index_file(
self, file_path, service_context, suffix=None
) -> GPTVectorStoreIndex:
if suffix and suffix == ".md":
loader = MarkdownReader()
document = loader.load_data(file_path)
elif suffix and suffix == ".epub":
epub_loader = EpubReader()
document = epub_loader.load_data(file_path)
else:
document = SimpleDirectoryReader(input_files=[file_path]).load_data()
index = GPTVectorStoreIndex.from_documents(
document, service_context=service_context, use_async=True
)
return index
but once it hits the from_documents it just is silent
but does the loader return documents?
document = SimpleDirectoryReader(input_files=[file_path]).load_data()
print(len(document))
Maybe also try without async?
it might also be borking on embeddings for whatever reason
another thing to try
service_context = ServiceContext.from_defaults(...., embed_model="local:BAAI/bge-small-en-v1.5")
Just to see if its an openai issue or not
sorry, kind of throwing everything I would do to debug at you lol
I appreciate it! Will try these out in a bit and let you know what I see
wait i'm a bit confused actually
because on my install it indexes correctly it seems?
when that error you got popped up, woild it have allowed the from_documents call to proceed without raising the exception up?
It would have, but there would have been nothing to index (since documents would be an empty list)
So the index would create, but there would be nothing to query