Performance

At a glance

The community member has built an LLM-based application using LlamaIndex to summarize YouTube video transcripts. The application is taking 9 seconds to summarize a 1:41 minute video, which is unacceptable for production. The community member has tried various approaches, such as using async, a list index instead of a vector index, and just the TreeSummarizer without an index, but the performance did not improve significantly.

The comments suggest that the speed is dependent on the number of LLM calls being made. One community member suggests using a list index with tree summarize and async as the best option, as the vector index only summarizes the top k most relevant chunks. Another community member points out that since the transcript fits into a single node, the application should only be making one LLM call, and the slowness is likely due to OpenAI's performance.

The community members discuss whether upgrading the OpenAI account would help with response times, and whether using other LLM providers, such as Anthropic or Anthropic's PALM, could be a better option.

Useful resources

rrini

So I have built this LLM based application using llamaindex wherein we are taking a youtube video URL from the end user and summarising it for them. I am using the YoutubeTranscriptLoader, generating it's transcript, creating nodes using the SimpleNodeParser and creating a VectorStoreIndex on top of these nodes.

This is the code after creating the index for the Summarisation operation :-

def summarize_transcript():
retriever = VectorIndexRetriever(
index=index, similarity_top_k=len(index.docstore.docs))
response_synthesizer = get_response_synthesizer(
response_mode='tree_summarize')

query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)

query_text = f"""
You are an upbeat and friendly tutor with an encouraging tone.\
Provide Key Insights from the context information ONLY.
For each key insight, provide relevant summary in the form of bullet points.
Use no more than 500 words in your summary.
"""

response = query_engine.query(query_text)

Running this function for a video of a mere length of 1:41 minutes is taking 9 seconds which is unacceptable in production. I tried the use_async option, using a list index instead of vector index option and using just the TreeSummariser and no index option as well, but the performance didn't improve by much.

Can you please help me out here. For your reference the app is hosted on Streamlit cloud and can be accessed here - https://llm-steamlit-poc.streamlit.app/

9 comments

LLogan M

Speed is entirely dependent on how many LLM calls are being made.

For summarizing, a list index with tree summarize with use async will probably be the best option? Using a vector index, it's only summarizing the top k most relevant chunks. Maybe that's OK for you though

9 seconds tells me that it's likely making 2-3 LLM calls. Maybe try using gpt-3.5-turbo-16k

rrini

@Logan M Yeah but why is it making 2-3 LLM calls, the transcript of the 1:14 minute youtube video is fitting in one node itself!

LLogan M

If it's fitting into one node (I.e. the transcript is less than the chunk size) then yea it's only making one llm call, and openai is just being slow

LLogan M

Not really much you can do about it tbh

rrini

May be off topic, but will upgrading my openAI account help with response times? anything at other levels that i can do that can help?

LLogan M

Hmm, yea not sure if upgrading gets you faster responses or not 🤔

rrini

Can I use any other LLM provider? OpenAI maybe isn't cutting it?

LLogan M

hmm, you could use anthropic or palm, if you can get an API key 😅 those are the most comparable in terms of capabilites

aasoroken

Hi rini I’m on the palm api team and can help with getting a key if interested

Add a reply

Find answers from the community

Performance