So I have built this LLM based application using llamaindex wherein we are taking a youtube video URL from the end user and summarising it for them. I am using the YoutubeTranscriptLoader, generating it's transcript, creating nodes using the SimpleNodeParser and creating a VectorStoreIndex on top of these nodes.
This is the code after creating the index for the Summarisation operation :-
def summarize_transcript():
retriever = VectorIndexRetriever(
index=index, similarity_top_k=len(index.docstore.docs))
response_synthesizer = get_response_synthesizer(
response_mode='tree_summarize')
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
query_text = f"""
You are an upbeat and friendly tutor with an encouraging tone.\
Provide Key Insights from the context information ONLY.
For each key insight, provide relevant summary in the form of bullet points.
Use no more than 500 words in your summary.
"""
response = query_engine.query(query_text)
Running this function for a video of a mere length of 1:41 minutes is taking 9 seconds which is unacceptable in production. I tried the use_async option, using a list index instead of vector index option and using just the TreeSummariser and no index option as well, but the performance didn't improve by much.
Can you please help me out here. For your reference the app is hosted on Streamlit cloud and can be accessed here -
https://llm-steamlit-poc.streamlit.app/