Find answers from the community

Updated 2 months ago

It s frustrating because the response

It's frustrating because the response payload is literally

Plain Text
  "usage": {
    "completion_tokens": 9,
    "prompt_tokens": 35,
    "total_tokens": 44
  }


Which has all the info I need, but the LlamaIndex abstraction makes it harder πŸ˜‚
L
i
31 comments
A curse of needing to support many LLMs -- openai is the only one providing these counts really
Already figured it out
Subclassed and overridden
Lol that would be my first instinct too, nice

If you wanted to ride within the rules of the library, there's probably a way to give chat it's distinct callback handler for each user, then you have the token counts for each user request
@Logan M Guess I'm lucky!!
Plain Text
@dataclass
class CriaChatResponse(AgentChatResponse):
    raw: Optional[dict] = field(default_factory=dict)


class CriaChatEngine(ContextChatEngine):

    @classmethod
    def from_index(cls, index: BaseIndex, **kwargs):

        index.as_query_engine()
        return cls.from_defaults(
            retriever=index.as_retriever(**kwargs),
            **kwargs,
        )

    async def achat(
        self, message: str, chat_history: Optional[List[ChatMessage]] = None
    ) -> CriaChatResponse:
        """
        Should maintain parity with superclass method.
        """

        if chat_history is not None:
            self._memory.set(chat_history)
        self._memory.put(ChatMessage(content=message, role="user"))

        context_str_template, nodes = await self._agenerate_context(message)
        prefix_messages = self._get_prefix_messages_with_context(context_str_template)
        all_messages = prefix_messages + self._memory.get()

        chat_response = await self._llm.achat(all_messages)
        ai_message = chat_response.message
        self._memory.put(ai_message)

        return CriaChatResponse(  # Custom response with a bit more info
            response=str(chat_response.message.content),
            sources=[
                ToolOutput(
                    tool_name="retriever",
                    content=str(prefix_messages[0]),
                    raw_input={"message": message},
                    raw_output=prefix_messages[0],
                )
            ],
            source_nodes=nodes,
            raw=chat_response.raw  # Add raw payload info
        )
That's all I had to do lol
might benefit honestly from making that change in the lib tho
AgentChatResponse could return the raw payload dict
Most LLMs have a raw response
In fact all have a raw response I would assume πŸ˜‚
Yea that's fair. As you can see it's there, just not passed to the top level πŸ˜…
Ooh and quick question, is there any overhead for creating a query engine? Anything loaded or whatever that is high in CPU?
Creating a query engine should be essentially free πŸ€”
Perfect. I thought so, but ya never know
Because I don't want to keep em in memory
Some tasks don't require chats, just a one-off query
I'd rather not keep a query engine in memory 24/7, and just create it whenever a query is made
But then of course the main method we use is a subclassed ChatContextEngine since we're building chatbots
Btw, we are starting a project soon to start wrangling our async stuff into order.

Since you seem to be an extreme power user of the library, it might be good to find a time to chat about your experience/point points so far.
Sure I'd be open for that
This lib has been a huge help in taking what would have been an impossibly large project and turning it into a managable one..
Cool! Do you have time sometime this week? I'm pretty free after today

(I'm in CST time btw)
Wed and Fri this week pretty much all day, after that it gets a bit harder until September 6th. Can still probably squeeze something, would just depend more on your availability so I can see if any matches
Sweet, how about tomorrow (Wednesday) at like 3pm CST?
What's your email? I can send a link
koganisa@yorku.ca
Yea see ya then! πŸ’ͺ
Add a reply
Sign up and join the conversation on Discord