Yea streaming in langchain is pretty annoying. You have to use a callbackhandler, and likely have to create your own callback handler
Basically the on_new_token()
function will run every time a new token is received (assuming you setup streaming=True)
The annoying thing about this is that it streams everything, so you might have to detect the part of the stream you want to show
OK, I saw it. It seems I must create my callback handler and filter the verbose log such as Thought,Action,Action Input.... Then reach to the actual answer.
I'm also surprised that Langchain doesn't give a complete example of this.
btw, I found that if i set streaming=True, and not do any modify to my program , it could running well as previous streaming=False. So I could always set streaming=True whether I steaming the output or not.
Yea, langchains support for this is... complicated haha
I found this example from langchain's discord , But I think this implementation is not a real-time stream. function chat() is a sync method,when it finish call, all of the output tokens had produced already.
No that works I think, it will livestream/print everything to the terminal
Since it uses the stdout callback handler
All right,Probably fine for this example. I want to say if using agent.run() replace it .
Yes, I know this handler can't be used directly.
Based on the fact that we want to respond out in real time over http, not stdout
I see what you mean, as stdout, it always does real-time output.
I'm just assuming this handler is use for write tokens into the buffer and eventually return
yea pretty much. So you could implement your own custom handlers and write to your own queue/buffer
yea, I had do it. But if I implement it like this. When I return buffer, the complete answer had produce already, because agent.run() execute finish. So I think it's not perfect.
unless I start another connection to front-end then real-time send tokens.
In our llama-index code, we do something like this under the hood which lets you have a raw generator to iterate over
Could do something similar for your handler
OK , I will study on this.
Hi Logan, For your update about Document's metadata I have a question.
For the already existed documents which persist to local. Do we need to migrate extra_info to metadata manually?
I used to manually specify the value of some keys in the extra_info for each document
It should (hopefully) automatically upgrade your stored data to the new format
Well, I saw it through debug.
It means that I don't have to change anything, just use metadata instead extra_info in the new documents.
Another issue:
index = load_index_from_storage(storage_context, service_context=service_context_3_5)
res = index.as_chat_engine(service_context=service_context_3_5).chat(question)
sometimes above codes will return this response:
{
"detail": "llama_index.chat_engine.condense_question.CondenseQuestionChatEngine.from_defaults() got multiple values for keyword argument 'service_context'"
}
If I remove parameter service_context of index.as_chat_engine() , it return normal.
Yea, it's already passing in the service context under the hood it seems π
Is that means I should never pass service_context again when call index.as_chat_engine() ? because the index object had already contains a service_context.
Yea pretty much. Although tbh we should handle this better in the code "the user should be able to override it if they want"
Hi Logan, I create a Handler for langchain agent streaming, just like yours. But only following tokens could be received in on_llm_new_token()
Thought: Do I need to use a tool? Yes
Action: document knowledge library of user "test"
Action Input: Supreme Court Rejects GOP Call for Unchecked Power in US Elections, Affirming State Courts Should Not Have 'Free Rein' to Change Election Rules
The actual answer "Observation" can not be received in this event.
And actually all I want is "observation".
But for langchain that's normal, right?