Do you mean to stop the chat history from overflowing? The newest version of llama index now has a basic window buffer for the chat history π
It's more involved than that but good to know
Alright @Logan M ...
store = MongoDBAtlasVectorSearch(get_db(), db_name=config["db_name"],collection_name=config["collection_name"], index_name=config["index_name"])
index = VectorStoreIndex.from_vector_store(vector_store=store)
service_context = ServiceContext.from_defaults(llm=OpenAI(temperature=config["temperature"], model=config["model_name"]), num_output=config["num_output"])
chat_engine = index.as_chat_engine(
node_postprocessors=[SentenceEmbeddingOptimizer(threshold_cutoff=config["threshold_cutoff"],percentile_cutoff=config["percentile_cutoff"])],
retriever_mode="embedding",
service_context = service_context,
similarity_top_k=config["similarity_top_k"],
text_qa_template=qa_template,
streaming=True,
condense_question_prompt=custom_prompt,
)
streaming_response = chat_engine.stream_chat(prompt, chat_history=modified_chat_history)
ValueError: Streaming is not enabled. Please use chat() instead.
How am I supposed to set it up for streaming properly if
streaming=True
is insufficient? π€
(btw I think I'm still on 7.4-ish if that matters)
Ok, I have time to look at this now lol give me a few mins
@Rubenator you are missing streaming=True
on the LLM definition I think. That error is raised because the query engine isn't returning a streaming response
We didn't have to do that for as_query_index
though so.. what's the difference? π€
Also, I just added streaming=True
to the llm, and have the same error
ok, let me make an example first lol
Well, I was able to avoid the error you got. But also seems like the streaming may still be buggy beyond that tbh (at least in the latest version)
Will try to patch in a bit here
What did you do to avoid the error? Or did you just, not encounter it? π€
I just didn't encounter it π
but I was using the latest version
I wish I could patch this now, but I'm out all weekend with family π
All good. I'll try updating on Monday and see if ithelps
My coworker says that updating to 0.7.9
did not fix the error, although I will double check myself rn
Yeah, same issue ValueError: Streaming is not enabled. Please use chat() instead
Yea I never did hit that error, which is a little weird.
But also that reminds me, I need to fix this in general (the streaming for condense question engine is still borked, besides this issue)
@Logan M while you're at it, small feature request... we'd like to be able to grab the request and response that the condense question engine does to condense the question (primarilly for token usage tracking). And, on a similar vein.. being able to directly grab token usage data from the openAI requests in general would be nice (no rush though, it is just a potential nice-to-have).
okay sweet, guess I didn't find that ty
New release cut that worked fine for me. Hope it works well for you!
You shouldn't need to set streaming=True anywhere now
Things aren't acting fine, but I'm checking some stuff -- but, seems like you changed some more stuff:
File "/root/pytest/venv/lib/python3.10/site-packages/llama_index/indices/base.py", line 389, in as_chat_engine
return OpenAIAgent.from_tools(
TypeError: OpenAIAgent.from_tools() got an unexpected keyword argument 'node_postprocessors'
yeah, this gripe is happening for a majority of the arguments we are passing in to as_chat_engine
-- where are they supposed to go instead?
I actually didn't change anything, a colleague fixed some stuff for streaming with condense engine.
The kwargs you are passing in will work for condense question chat mode I think, but for other modes I can see this being an issue due to kwarg abuse in general.
Workaround here is either a) setting chat_mode="condense_question"
or b) just creating the agent yourself, rather than using as_chat_engine
where exactly would we set the chat_mode?
index.as_chat_engine(chat_mode="condense_question")
The default chat mode changed to agents (since that's generally a better user experience tbh)
Oh so, it has to be the very first argument
yea, due to the function api
def as_chat_engine(
self, chat_mode: ChatMode = ChatMode.BEST, **kwargs: Any
) -> BaseChatEngine:
okay great thank you π
Actually... tried that... still the same error about streaming not being enabled π€
store = MongoDBAtlasVectorSearch(get_db(), db_name=config["db_name"],collection_name=config["collection_name"], index_name=config["index_name"])
index = VectorStoreIndex.from_vector_store(vector_store=store)
service_context = ServiceContext.from_defaults(llm=OpenAI(temperature=config["temperature"], model=config["model_name"]), num_output=config["num_output"])
chat_engine = index.as_chat_engine(chat_mode="condense_question",
node_postprocessors=[SentenceEmbeddingOptimizer(threshold_cutoff=config["threshold_cutoff"],percentile_cutoff=config["percentile_cutoff"])],
retriever_mode="embedding",
service_context = service_context,
similarity_top_k=config["similarity_top_k"],
text_qa_template=qa_template,
condense_question_prompt=custom_prompt,
)
streaming_response = chat_engine.stream_chat(prompt)
@Logan M this looks kinda suspect:
edit: oops wrong function haha
still getting the error nonetheless
Are you sure you upgraded? I feel like this is impossible haha
There must be some difference we aren't seeing
service_context = ServiceContext.from_defaults(llm=OpenAI(temperature=0, model=config["model_name"]))
chat_engine = index.as_chat_engine(chat_mode="condense_question", service_context = service_context)
this fails^
this does not:
service_context = ServiceContext.from_defaults(llm=OpenAI(model=config["model_name"]))
chat_engine = index.as_chat_engine(chat_mode="condense_question", service_context = service_context)
oh sorry @Logan M -- the as_chat_engine also has service_context=service_context
in both xD
This worked for me just now
Hmm I'm still not able to replicate the original error
I think I was duped by this in my testing last night though, the streaming just hangs :PSadge: back to the grindstone lol
you're also using a different model -- we're on "gpt-3.5-turbo"
Just changed it, same result -- streaming just hangs forever
although I get the same error on the other version
If you can humor me for like 1 sec
cd ~/
python -m venv sanity_env
source sanity_env/bin/activate
pip install llama-index
This env should not have the "streaming not enabled" error (but also, streaming probably will just hang, like mine is)
I had to install python-dotenv
and pymongo
in addition to that to run my code but that's all
:PepeHands:
Can you stream a normal query engine response?
response = index.as_query_engine(streaming=True).query("test")
print(type(response))
I thought you said no more streaming=True?
or is that only for chat engine?
Not for as query engine (the interfaces are a little out of alignment)
Yea, since chat engines have specific stream endpoints (there's no stream_query on the query engines yet)
INFO:numexpr.utils:NumExpr defaulting to 6 threads.
NumExpr defaulting to 6 threads.
Generating response
<class 'llama_index.response.schema.Response'>
yup
that's not a streaming response
yes, like the error said ;p
well, narrowing down the issue haha
its not becuase of as_chat_engine
Can I see the code+imports that you have for setting up the service context? I feel like you've shared this before, but just double checking
Hmmm or maybe it's related to using mongodb
Just need to narrow the example down to something simple
lemme just... comment out basically everything
this is the most minimum example I can think of
from llama_index import ListIndex, Document, ServiceContext
from llama_index.llms import OpenAI
index = ListIndex.from_documents([Document.example()], service_context=ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0)))
response = index.as_query_engine(streaming=True).query("test")
print(type(response))
mkay... so... I'm not even using a service context
if I shuffle stuff around so as not to do:
from dotenv import load_dotenv
load_dotenv()
but as soon as I put that back in
I get a Response instead of a StreamingResponse
er wait no... but I'm close
Okay @Logan M I have fully narrowed it down -- I get a Response
back instead of waiting forever, when the OPENAI_API_KEY
Environment variable is set (with or without loadenv)
So here is my min repro:
import os
os.environ["OPENAI_API_KEY"] = "****"
config = {"mongo_uri":'****', "db_name":'****'}
import pymongo
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch
from llama_index.indices.vector_store.base import VectorStoreIndex
db = pymongo.MongoClient(config["mongo_uri"])[config["db_name"]]
store = MongoDBAtlasVectorSearch(db)
index = VectorStoreIndex.from_vector_store(vector_store=store)
response = index.as_query_engine(streaming=True).query("test")
print(type(response))
Does it re-produce without the mongodb too?
Oh and, the key must be valid... when it is not valid, it does the wait forever
No -- when I use ListIndex(Document.example()], --etc
I get a StreamingResponse back
Let me know if you are able to reproduce or not ^_^;
ah, so it's possibly related to mongodb then π€ Hmmm
We've been trying all sorts of things over here... any thoughts on what it could be? π€
(noticed some of the fixes in the latest changelogs but, still seeting same issue)
@Logan M just figured it out
store = MongoDBAtlasVectorSearch(get_db(), db_name=config["db_name"],collection_name=config["collection_name"], index_name=config["index_name"])
Due to some minor refactoring, our
get_db
function was returning a
mongodb['db_name_here']
instead of just
mongodb
.
This caused 0 nodes to get returned... but at no point was that fact caught.
And then it ultimately results in the response being a
None
type, which results in a very boring and empty
Response
getting created and returned instead of a streaming response:
So, proposed solution would be to:
A) throw an error if that first MongoDBAtlasVectorSearch
argument is not a database object instance
B) throw an error (or something like that) if the query returns no nodes (rather than letting it get past all the string checks)
it is actually rather miraculous that it makes it all the way to returning a response at all but, ultimately it is just due to these sorts of things not getting checked.
Anyway, with that out of the way...
we are now suprised to see:
TypeError: 'StreamingAgentChatResponse' object is not iterable
Is that intentionally not iterable? π€
yea, should use response.response_gen
to get the iterator
The response has other things on it, like sources
, which gives you access to the raw query engine response under the good
Definitely would be an easy PR to make if you had the bandwidth! Great detective skills here π§
Sure I'm down once we get out thing out the door next week, but I'll probably ask for some help double checking that I'm doing the checks in the correct places -- there's a lot of class inheritance happening π