Find answers from the community

Updated 4 months ago

Managing large context in agent-based workflow for youtube lecture analysis

At a glance

The community member is building a tool to analyze long YouTube lectures using an agent-based workflow, but is facing issues with managing the large context. The community members suggest two approaches: summarizing the context when it gets too long, and dynamically retrieving what is needed from the total context. One community member specifically recommends trying the second approach and mentions the Gemini API, which can handle a 2M context window, as a potential solution, as the community member's current approach using GPT-4 mini with a 128K context window is not sufficient.

Useful resources
hi everyone, I'm building a tool to analyze long youtube lectures using an agent-based workflow, but i'm running into issues with managing the large context. What whould be the best approach or tools to handle this efficiently without losing important information?

What I'm currently doing is splitting the text into fragments and passing each fragment through gpt4o-mini, but the result is still too long to be processed in the agent workflow
L
S
A
4 comments
I mean, feels like the two obvious ideas are
  • summarizing the context when it gets too long
  • dynamically retrieving what you need from the total context (i.e. this is basically RAG over some infinitely sized context)
Okay, I really like the second idea, I'm going to try it out a bit, thank you very much.
Gemini. 2M context window. https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/ . IIRC gpt4o has 128K which is awesome but sounds like it's not enough
Also I think Gemini can do it over the video itself. Google says that you can reason over 1 hour of video so...
Add a reply
Sign up and join the conversation on Discord