LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

I'm assuming it's api instability but

I'm assuming it's api instability but

At a glance

The community members are experiencing issues with the OpenAI API, including instability, rate limits, and timeouts. They have tried various approaches to address these problems, such as enabling debug logging, adjusting timeout and retry settings, and passing the LLM instance to the service context. However, the issues seem to persist, particularly with querying the LlamaIndex for PowerPoint files. The community members are still investigating the problem and waiting for potential fixes from the LangChain library.

Useful resources

·

I'm assuming it's api instability but some anecdotal backups would have me more at ease

T

L

K

80 comments

Have you tried

Plain Text

openai.log = "debug"

It might give some more details if it's related to API issues

I was having tons of api issues yesterday, the api was down / flakey

Oh good point, will try that!

And that'll work for llama index calls as well?

I'm still having issues today where llama index queries just won't return

I see that the openai library increased their timeout to 10 mins, with tons of retries I can see how that 10 mins would compound to hours

no issues this morning?

hmmm thats pretty brutal. We should lower that and make it configurable

It's been working for me so far

Yea it will work for any call made using the openai client

Oh perfect

If it ends up not being rate limit issues and etc, can I show u my implementation to see if u can catch what I'm doing wrong? It was working perfectly at 0.8.62 and when I went to 0.8.65 with no change in the code it just times out

yea for sure 🤔 Like the change between those two versions is this new openai client really

does llm.complete("Hello!") work? Would be a quick sanity test

@Logan M My implementation looks like this

Plain Text

llm = ChatOpenAI(model=model, temperature=0)
memory = ConversationSummaryBufferMemory(
    memory_key="memory",
    return_messages=True,
    llm=llm,
    max_token_limit=100000 if "preview" in model else max_token_limit,
)
agent_kwargs = {
    "extra_prompt_messages": [MessagesPlaceholder(variable_name="memory")],
    "system_message": SystemMessage(
        content="You are a superpowered version of GPT that is able to answer questions about the data you're "
        "connected to. Each different tool you have represents a different dataset to interact with. "
        "If you are asked to perform a task that spreads across multiple datasets, use multiple tools "
        "for the same prompt. When the user types links in chat, you will have already been connected "
        "to the data at the link by the time you respond. When using tools, the input should be "
        "clearly created based on the request of the user. For example, if a user uploads an invoice "
        "and asks how many usage hours of X was present in the invoice, a good query is 'X hours'. "
        "Avoid using single word queries unless the request is very simple. You can query multiple times to break down complex requests and retrieve more information."
    ),
}
agent_chain = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    verbose=True,
    agent_kwargs=agent_kwargs,
    memory=memory,
    handle_parsing_errors="Check your output and make sure it conforms!",
)


My index tools are defined like
tool_config = IndexToolConfig(
    query_engine=engine,
    name=f"{filename}-index",
    description=f"Use this tool if the query seems related to this summary: {summary}",
    tool_kwargs={
        "return_direct": False,
    },
    max_iterations=5,
)
tool = LlamaIndexTool.from_tool_config(tool_config)

Where engine is defined as
retriever = VectorIndexRetriever(
    index=index, similarity_top_k=2, service_context=service_context
)

response_synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.COMPACT_ACCUMULATE,
    use_async=True,
    refine_template=TEXT_QA_SYSTEM_PROMPT,
    service_context=service_context,
    verbose=True,
)

engine = RetrieverQueryEngine(
    retriever=retriever, response_synthesizer=response_synthesizer
)

Do you see anything inherently wrong here?

what does your service context look like?

service_context = ServiceContext.from_defaults(
embed_model=embedding_model,
callback_manager=callback_manager,
node_parser=node_parser,
)

wait.. I just realized I don't have the llm in my service context

👀

what does it default to?

when I dont have the llm= param in there

if u recall

That shouldn't TECHNICALLY be an issue, it defaults to

Plain Text

from llama_index.llms import OpenAI

llm = OpenAI(temeprature=0.1, model="gpt-3.5-turbo")

hm

you can try service_context.llm.complete("Hello!") to check if it works

The issue is it works sometimes sometimes doesn't

ohhh

sometimes the querying of the index will be perfect sometimes it'll just stall forever

so sometimes llm.complete works sometimes stalls

ok I think this is just issues with OpenAI's API these last few days, combined with the fact that the retry mechanism settings are hot garbage right now

You should be able to do something like

llm = OpenAI(temeprature=0.1, model="gpt-3.5-turbo", max_retries=2, additional_kwargs={"timeout": 30}) to try and improve

ok I can't remeber if the timeout works under additional_kwargs or the kwarg directly lol one or the other should help

I'm still defining my LLM as llm = ChatOpenAI(model=model, temperature=0)

is that an issue?

Technically no, we are compatible with that. But you'll have to changes the retries and timeout from there (however it works for langchain, maybe the IDE autocomplete hints can help lol)

or you could start by at least passing that LLM into the service context

Passing the LLM seems to have helped a lot, gpt-3.5 turbo seems even more unstable compared to what it should've been using (gpt-4-32k)

timeout seems to be like

Plain Text

timeout = httpx.Timeout(10.0, read=5.0, write=10.0, connect=2.0)
service_context = ServiceContext.from_defaults(
    llm=OpenAI(max_retries=2, additional_kwargs={"timeout": timeout}),
    embed_model=embedding_model,
    callback_manager=callback_manager,
    node_parser=node_parser,
)

nvm that doesn't work bc langchain's broken

and passing it into llamaindex doesn't work because it's wrapped by langchain

gonna have to wait for langchain to fix their timeouts i think

damn you langchaaain

lol

@Logan M I am running into a weird condition

when I index a PPTX and then query over that PPTX, the query doesn't return

other file types return normally now..

Uhhhh lol

Let me confirm this

@Logan M can u try with this when u get some time and see if u can get a summary of the PPTX from llamaindex?

lol

⚒️

lol ok lets try it right now

uhhhh

Failed to load file Capitalism_vs_Communism.ppt with error: File is not a zip file. Skipping...

huh

lol

seems like the pptx package is failing to open/read the file

https://colab.research.google.com/drive/1y_KkRc7DFuyfCfbZlVQC42S08twjbNaM?usp=sharing

Oh I see

but llama_index isn't popping up an error, no?

My instance just seems to silently stall and not throw any sort of error when that loader is used

hmm weird. Yea like I installed fresh in colab there and it errored out pretty quick trying to open the file

Are you using simple directory reader? Maybe it didn't actually load any documents?

That one failed a little more silently

(well, it printed an issue, but kept trucking)

yeah I am using simple directory reader, and I think I confirmed that there is a file being fed to it // the file path is correct

My implementation is basically

Plain Text

    def index_file(
        self, file_path, service_context, suffix=None
    ) -> GPTVectorStoreIndex:
        if suffix and suffix == ".md":
            loader = MarkdownReader()
            document = loader.load_data(file_path)
        elif suffix and suffix == ".epub":
            epub_loader = EpubReader()
            document = epub_loader.load_data(file_path)
        else:
            document = SimpleDirectoryReader(input_files=[file_path]).load_data()
        index = GPTVectorStoreIndex.from_documents(
            document, service_context=service_context, use_async=True
        )
        return index

but once it hits the from_documents it just is silent

but does the loader return documents?

Plain Text

document = SimpleDirectoryReader(input_files=[file_path]).load_data()
print(len(document))

Maybe also try without async?

it might also be borking on embeddings for whatever reason

another thing to try

Plain Text

service_context = ServiceContext.from_defaults(...., embed_model="local:BAAI/bge-small-en-v1.5")

Just to see if its an openai issue or not

sorry, kind of throwing everything I would do to debug at you lol

I appreciate it! Will try these out in a bit and let you know what I see

wait i'm a bit confused actually

because on my install it indexes correctly it seems?

when that error you got popped up, woild it have allowed the from_documents call to proceed without raising the exception up?

It would have, but there would have been nothing to index (since documents would be an empty list)

So the index would create, but there would be nothing to query

Add a reply

Sign up and join the conversation on Discord