Agent Workflow with Prompt Caching

Question

Hey guys! Is it possible to run the AgentWorkflow with prompt caching?

Ewan · Answer

Will this prompt cache all of the llm calls of the agents?

Ewan · Answer

I haven't dove too deep into the code of the agent workflow, but my hunch is that it's hitting the LLM with the full state every time it goes to interact.
These are my input / output token logs, it generally follows that the input tokens for the next call are the output tokens + the input tokens for this call

Logan M · Answer

Only the messages with that tag will be cached.Maybe the class needs to an option to just blanket apply thr cache?Although fyi, I'm pretty sure i read only messages past a certain length get cached?

Ewan · Answer

Yeah for sonnet it's past 1024 tokens and for haiku its 2048

Ewan · Answer

but if we could batch the initial prompts and state object together, then at least those will hit and I think that's a good start

Ewan · Answer

and you can add new messages after that which wouldn't be included

Logan M · Answer

Honestly I think the blanket option is the way to go?

llm = Anthropic(..., cache_all=True) -- and then every chat message gets that cache control attached to it

Any downside to me adding that to the class? Does that make sense to do?

Ewan · Answer

Blanket option is good, the one and only issue I could see is that cache writes are 1.25x the price so if you're constantly writing to the cache it might be a bit more expensive

Ewan · Answer

but I haven't read up too much about how they decide cache hits

Logan M · Answer

hmmm -- alternative UX: llm = Anthropic(..., cache_idx=2) -- where basically it will always cache the first X messages (and -1 would cache all)

Logan M · Answer

then you can control costs slightly 😅

Ewan · Answer

Is there a flag that you could put on it that would essentially say auto cache whenever there is a function that can support it? a.k.a if there is a built in situation (like with workflow agents) that you could just put auto_cache=True and leave it up to the class you're using to call it depending on its goal?

Not sure if that makes sense but I think caching is a very specific condition and use case that only a limited amount of situations apply to and they'd be unique to each function, if we could just say "Yeah if you think we can cache let's do it", it might be the safest

Logan M · Answer

I think the hard part with that is deciding what to cache? As soon as you set cache_control on a message, that means that message and everything before it will be cached by anthropic

Logan M · Answer

The only solid case I've seen so far when you want to cache is when
a) the input is large
b) its something that will get repeated a lot

This makes sense for something like their contextual retrieval demo. But I don't entirely see a similar use-case that jumps out to me with AgentWorkflow besides just caching everything or the first X messages 🤔

Logan M · Answer

Or are you saying the agent itself would decide when to cache? You could technically code that today

Plain Text

async def set_cache_point(ctx: Context) -> None:
  """Use this when a very large message is introduced into the chat history that should be cached."""
  memory = await ctx.get("memory")
  messages = await memory.get_all()
  for message in messages:
    message.additional_kwargs["cache_control"] = ...

  memory.set(messages)
  await ctx.set("memory", memory)

...


agent = AgentWorkflow.from_tools_or_functions([..., set_cache_point], ...)

Logan M · Answer

kind of a neat demo -- I wouldn't expect anyone to know how to do that without reading the source code, since the memory being in the context is kind of hidden and not documented right now

Find answers from the community

Agent Workflow with Prompt Caching