Find answers from the community

Updated 4 months ago

I'm assuming it's api instability but

At a glance
The community members are experiencing issues with the OpenAI API, including instability, rate limits, and timeouts. They have tried various approaches to address these problems, such as enabling debug logging, adjusting timeout and retry settings, and passing the LLM instance to the service context. However, the issues seem to persist, particularly with querying the LlamaIndex for PowerPoint files. The community members are still investigating the problem and waiting for potential fixes from the LangChain library.
Useful resources
I'm assuming it's api instability but some anecdotal backups would have me more at ease
T
L
K
80 comments
Have you tried

Plain Text
openai.log = "debug"


It might give some more details if it's related to API issues
I was having tons of api issues yesterday, the api was down / flakey
Oh good point, will try that!
And that'll work for llama index calls as well?
I'm still having issues today where llama index queries just won't return
I see that the openai library increased their timeout to 10 mins, with tons of retries I can see how that 10 mins would compound to hours
no issues this morning?
hmmm thats pretty brutal. We should lower that and make it configurable
It's been working for me so far
Yea it will work for any call made using the openai client
If it ends up not being rate limit issues and etc, can I show u my implementation to see if u can catch what I'm doing wrong? It was working perfectly at 0.8.62 and when I went to 0.8.65 with no change in the code it just times out
yea for sure πŸ€” Like the change between those two versions is this new openai client really
does llm.complete("Hello!") work? Would be a quick sanity test
@Logan M My implementation looks like this
Plain Text
llm = ChatOpenAI(model=model, temperature=0)
memory = ConversationSummaryBufferMemory(
    memory_key="memory",
    return_messages=True,
    llm=llm,
    max_token_limit=100000 if "preview" in model else max_token_limit,
)
agent_kwargs = {
    "extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
    "system_message": SystemMessage(
        content="You are a superpowered version of GPT that is able to answer questions about the data you're "
        "connected to. Each different tool you have represents a different dataset to interact with. "
        "If you are asked to perform a task that spreads across multiple datasets, use multiple tools "
        "for the same prompt. When the user types links in chat, you will have already been connected "
        "to the data at the link by the time you respond. When using tools, the input should be "
        "clearly created based on the request of the user. For example, if a user uploads an invoice "
        "and asks how many usage hours of X was present in the invoice, a good query is 'X hours'. "
        "Avoid using single word queries unless the request is very simple. You can query multiple times to break down complex requests and retrieve more information."
    ),
}
agent_chain = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    agent_kwargs=agent_kwargs,
    memory=memory,
    handle_parsing_errors="Check your output and make sure it conforms!",
)


My index tools are defined like
tool_config = IndexToolConfig(
    query_engine=engine,
    name=f"{filename}-index",
    description=f"Use this tool if the query seems related to this summary: {summary}",
    tool_kwargs={
        "return_direct": False,
    },
    max_iterations=5,
)
tool = LlamaIndexTool.from_tool_config(tool_config)

Where engine is defined as
retriever = VectorIndexRetriever(
    index=index, similarity_top_k=2, service_context=service_context
)

response_synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.COMPACT_ACCUMULATE,
    use_async=True,
    refine_template=TEXT_QA_SYSTEM_PROMPT,
    service_context=service_context,
    verbose=True,
)

engine = RetrieverQueryEngine(
    retriever=retriever, response_synthesizer=response_synthesizer
)
Do you see anything inherently wrong here?
what does your service context look like?
service_context = ServiceContext.from_defaults(
embed_model=embedding_model,
callback_manager=callback_manager,
node_parser=node_parser,
)
wait.. I just realized I don't have the llm in my service context
what does it default to?
when I dont have the llm= param in there
if u recall
That shouldn't TECHNICALLY be an issue, it defaults to

Plain Text
from llama_index.llms import OpenAI

llm = OpenAI(temeprature=0.1, model="gpt-3.5-turbo")
you can try service_context.llm.complete("Hello!") to check if it works
The issue is it works sometimes sometimes doesn't
sometimes the querying of the index will be perfect sometimes it'll just stall forever
so sometimes llm.complete works sometimes stalls
ok I think this is just issues with OpenAI's API these last few days, combined with the fact that the retry mechanism settings are hot garbage right now
You should be able to do something like

llm = OpenAI(temeprature=0.1, model="gpt-3.5-turbo", max_retries=2, additional_kwargs={"timeout": 30}) to try and improve
ok I can't remeber if the timeout works under additional_kwargs or the kwarg directly lol one or the other should help
I'm still defining my LLM as llm = ChatOpenAI(model=model, temperature=0)
is that an issue?
Technically no, we are compatible with that. But you'll have to changes the retries and timeout from there (however it works for langchain, maybe the IDE autocomplete hints can help lol)
or you could start by at least passing that LLM into the service context
Passing the LLM seems to have helped a lot, gpt-3.5 turbo seems even more unstable compared to what it should've been using (gpt-4-32k)
timeout seems to be like
Plain Text
timeout = httpx.Timeout(10.0, read=5.0, write=10.0, connect=2.0)
service_context = ServiceContext.from_defaults(
    llm=OpenAI(max_retries=2, additional_kwargs={"timeout": timeout}),
    embed_model=embedding_model,
    callback_manager=callback_manager,
    node_parser=node_parser,
)
nvm that doesn't work bc langchain's broken
and passing it into llamaindex doesn't work because it's wrapped by langchain
gonna have to wait for langchain to fix their timeouts i think
damn you langchaaain
@Logan M I am running into a weird condition
when I index a PPTX and then query over that PPTX, the query doesn't return
other file types return normally now..
Let me confirm this
@Logan M can u try with this when u get some time and see if u can get a summary of the PPTX from llamaindex?
βš’οΈ
lol ok lets try it right now
Failed to load file Capitalism_vs_Communism.ppt with error: File is not a zip file. Skipping...
seems like the pptx package is failing to open/read the file
but llama_index isn't popping up an error, no?
My instance just seems to silently stall and not throw any sort of error when that loader is used
hmm weird. Yea like I installed fresh in colab there and it errored out pretty quick trying to open the file
Are you using simple directory reader? Maybe it didn't actually load any documents?
That one failed a little more silently
(well, it printed an issue, but kept trucking)
yeah I am using simple directory reader, and I think I confirmed that there is a file being fed to it // the file path is correct
My implementation is basically
Plain Text
    def index_file(
        self, file_path, service_context, suffix=None
    ) -> GPTVectorStoreIndex:
        if suffix and suffix == ".md":
            loader = MarkdownReader()
            document = loader.load_data(file_path)
        elif suffix and suffix == ".epub":
            epub_loader = EpubReader()
            document = epub_loader.load_data(file_path)
        else:
            document = SimpleDirectoryReader(input_files=[file_path]).load_data()
        index = GPTVectorStoreIndex.from_documents(
            document, service_context=service_context, use_async=True
        )
        return index
but once it hits the from_documents it just is silent
but does the loader return documents?

Plain Text
document = SimpleDirectoryReader(input_files=[file_path]).load_data()
print(len(document))


Maybe also try without async?
it might also be borking on embeddings for whatever reason
another thing to try

Plain Text
service_context = ServiceContext.from_defaults(...., embed_model="local:BAAI/bge-small-en-v1.5")


Just to see if its an openai issue or not
sorry, kind of throwing everything I would do to debug at you lol
I appreciate it! Will try these out in a bit and let you know what I see
wait i'm a bit confused actually
because on my install it indexes correctly it seems?
when that error you got popped up, woild it have allowed the from_documents call to proceed without raising the exception up?
It would have, but there would have been nothing to index (since documents would be an empty list)
So the index would create, but there would be nothing to query
Add a reply
Sign up and join the conversation on Discord