The post asks if anyone has token counting working for streaming, as it seems prompt and completion tokens are not tracked. The comments indicate that this feature is not yet implemented, but a community member suggests it could be possible by wrapping the generator to include calling the token counting callback once the generator is exhausted. Another community member says they will try to implement it, but it is not a high priority at the moment. The discussion also touches on a separate issue related to source nodes in the AgentChatResponse, which a community member is working to address.
hi, does anyone have token counting working for streaming? seems prompt and completion tokens are not tracked. in verbose counter logs it seems prompt and completion events are not raised
@Logan M if you have a chance take a quick peek at the agentchatresponse and source node question in kapa bot. if i'm correct (that it was dropped in a previous release), then i'll also work on adding that back
sorry, what's the issue? π There are still sources in the agent chat response, but the sources are formatted as ToolOutput objects (since there could be any number of tools)
If the tool was a query engine, you can access the raw_output which is the query engine response output