Find answers from the community

Updated 2 months ago

Performance

So I have built this LLM based application using llamaindex wherein we are taking a youtube video URL from the end user and summarising it for them. I am using the YoutubeTranscriptLoader, generating it's transcript, creating nodes using the SimpleNodeParser and creating a VectorStoreIndex on top of these nodes.

This is the code after creating the index for the Summarisation operation :-

def summarize_transcript():
retriever = VectorIndexRetriever(
index=index, similarity_top_k=len(index.docstore.docs))
response_synthesizer = get_response_synthesizer(
response_mode='tree_summarize')

query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)

query_text = f"""
You are an upbeat and friendly tutor with an encouraging tone.\
Provide Key Insights from the context information ONLY.
For each key insight, provide relevant summary in the form of bullet points.
Use no more than 500 words in your summary.
"""

response = query_engine.query(query_text)

Running this function for a video of a mere length of 1:41 minutes is taking 9 seconds which is unacceptable in production. I tried the use_async option, using a list index instead of vector index option and using just the TreeSummariser and no index option as well, but the performance didn't improve by much.

Can you please help me out here. For your reference the app is hosted on Streamlit cloud and can be accessed here - https://llm-steamlit-poc.streamlit.app/
L
r
a
9 comments
Speed is entirely dependent on how many LLM calls are being made.

For summarizing, a list index with tree summarize with use async will probably be the best option? Using a vector index, it's only summarizing the top k most relevant chunks. Maybe that's OK for you though

9 seconds tells me that it's likely making 2-3 LLM calls. Maybe try using gpt-3.5-turbo-16k
@Logan M Yeah but why is it making 2-3 LLM calls, the transcript of the 1:14 minute youtube video is fitting in one node itself!
If it's fitting into one node (I.e. the transcript is less than the chunk size) then yea it's only making one llm call, and openai is just being slow
Not really much you can do about it tbh
May be off topic, but will upgrading my openAI account help with response times? anything at other levels that i can do that can help?
Hmm, yea not sure if upgrading gets you faster responses or not πŸ€”
Can I use any other LLM provider? OpenAI maybe isn't cutting it?
hmm, you could use anthropic or palm, if you can get an API key πŸ˜… those are the most comparable in terms of capabilites
Hi rini I’m on the palm api team and can help with getting a key if interested
Add a reply
Sign up and join the conversation on Discord