LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

Langchain streaming

Langchain streaming

At a glance

·

Hi How to response stream output when using agent that built via initialize_agent(), I can't do it following your notebook which wrote for as_chat_engine(). Because the function agent.run() only return string type.

L

Q

27 comments

Yea streaming in langchain is pretty annoying. You have to use a callbackhandler, and likely have to create your own callback handler

Basically the on_new_token() function will run every time a new token is received (assuming you setup streaming=True)

The annoying thing about this is that it streams everything, so you might have to detect the part of the stream you want to show

https://python.langchain.com/docs/modules/callbacks/

OK, I saw it. It seems I must create my callback handler and filter the verbose log such as Thought,Action,Action Input.... Then reach to the actual answer.
I'm also surprised that Langchain doesn't give a complete example of this.

btw, I found that if i set streaming=True, and not do any modify to my program , it could running well as previous streaming=False. So I could always set streaming=True whether I steaming the output or not.

Yea, langchains support for this is... complicated haha

I found this example from langchain's discord , But I think this implementation is not a real-time stream. function chat() is a sync method,when it finish call, all of the output tokens had produced already.

Attachment

No that works I think, it will livestream/print everything to the terminal

Since it uses the stdout callback handler

All right,Probably fine for this example. I want to say if using agent.run() replace it .

Yes, I know this handler can't be used directly.

Based on the fact that we want to respond out in real time over http, not stdout

I see what you mean, as stdout, it always does real-time output.
I'm just assuming this handler is use for write tokens into the buffer and eventually return

yea pretty much. So you could implement your own custom handlers and write to your own queue/buffer

yea, I had do it. But if I implement it like this. When I return buffer, the complete answer had produce already, because agent.run() execute finish. So I think it's not perfect.

Attachment

unless I start another connection to front-end then real-time send tokens.

In our llama-index code, we do something like this under the hood which lets you have a raw generator to iterate over

Attachment

Attachment

Could do something similar for your handler

OK , I will study on this.

Hi Logan, For your update about Document's metadata I have a question.
For the already existed documents which persist to local. Do we need to migrate extra_info to metadata manually?

I used to manually specify the value of some keys in the extra_info for each document

It should (hopefully) automatically upgrade your stored data to the new format

Well, I saw it through debug.
It means that I don't have to change anything, just use metadata instead extra_info in the new documents.

Another issue:
index = load_index_from_storage(storage_context, service_context=service_context_3_5)
res = index.as_chat_engine(service_context=service_context_3_5).chat(question)

sometimes above codes will return this response:
{
"detail": "llama_index.chat_engine.condense_question.CondenseQuestionChatEngine.from_defaults() got multiple values for keyword argument 'service_context'"
}

If I remove parameter service_context of index.as_chat_engine() , it return normal.

Yea, it's already passing in the service context under the hood it seems 👀

Is that means I should never pass service_context again when call index.as_chat_engine() ? because the index object had already contains a service_context.

Yea pretty much. Although tbh we should handle this better in the code "the user should be able to override it if they want"

Hi Logan, I create a Handler for langchain agent streaming, just like yours. But only following tokens could be received in on_llm_new_token()


Thought: Do I need to use a tool? Yes
Action: document knowledge library of user "test"
Action Input: Supreme Court Rejects GOP Call for Unchecked Power in US Elections, Affirming State Courts Should Not Have 'Free Rein' to Change Election Rules

The actual answer "Observation" can not be received in this event.
And actually all I want is "observation".

But for langchain that's normal, right?

Add a reply

Sign up and join the conversation on Discord