Find answers from the community

Updated 7 days ago

Agent Workflow with Prompt Caching

Hey guys! Is it possible to run the AgentWorkflow with prompt caching?
E
L
16 comments
Will this prompt cache all of the llm calls of the agents?
I haven't dove too deep into the code of the agent workflow, but my hunch is that it's hitting the LLM with the full state every time it goes to interact.
These are my input / output token logs, it generally follows that the input tokens for the next call are the output tokens + the input tokens for this call
Attachment
image.png
Only the messages with that tag will be cached.

Maybe the class needs to an option to just blanket apply thr cache?

Although fyi, I'm pretty sure i read only messages past a certain length get cached?
Yeah for sonnet it's past 1024 tokens and for haiku its 2048
but if we could batch the initial prompts and state object together, then at least those will hit and I think that's a good start
and you can add new messages after that which wouldn't be included
Honestly I think the blanket option is the way to go?

llm = Anthropic(..., cache_all=True) -- and then every chat message gets that cache control attached to it

Any downside to me adding that to the class? Does that make sense to do?
Blanket option is good, the one and only issue I could see is that cache writes are 1.25x the price so if you're constantly writing to the cache it might be a bit more expensive
but I haven't read up too much about how they decide cache hits
hmmm -- alternative UX: llm = Anthropic(..., cache_idx=2) -- where basically it will always cache the first X messages (and -1 would cache all)
then you can control costs slightly πŸ˜…
Is there a flag that you could put on it that would essentially say auto cache whenever there is a function that can support it? a.k.a if there is a built in situation (like with workflow agents) that you could just put auto_cache=True and leave it up to the class you're using to call it depending on its goal?

Not sure if that makes sense but I think caching is a very specific condition and use case that only a limited amount of situations apply to and they'd be unique to each function, if we could just say "Yeah if you think we can cache let's do it", it might be the safest
I think the hard part with that is deciding what to cache? As soon as you set cache_control on a message, that means that message and everything before it will be cached by anthropic
The only solid case I've seen so far when you want to cache is when
a) the input is large
b) its something that will get repeated a lot

This makes sense for something like their contextual retrieval demo. But I don't entirely see a similar use-case that jumps out to me with AgentWorkflow besides just caching everything or the first X messages πŸ€”
Or are you saying the agent itself would decide when to cache? You could technically code that today

Plain Text
async def set_cache_point(ctx: Context) -> None:
  """Use this when a very large message is introduced into the chat history that should be cached."""
  memory = await ctx.get("memory")
  messages = await memory.get_all()
  for message in messages:
    message.additional_kwargs["cache_control"] = ...

  memory.set(messages)
  await ctx.set("memory", memory)

...


agent = AgentWorkflow.from_tools_or_functions([..., set_cache_point], ...)
kind of a neat demo -- I wouldn't expect anyone to know how to do that without reading the source code, since the memory being in the context is kind of hidden and not documented right now
Add a reply
Sign up and join the conversation on Discord