Chat

Hi guys, not sure what is wrong, when i am trying this CondensePlusContextChatEngine when the user asks the second question it stays in a loop for every, so only user can ask one question. I dont have this problem, when i use index.as_chat_engine any reason what might be the problem? @Logan M

47 comments

I'm not sure what you mean by looping? It works fine for me

see this is my chat_engine in streamlit if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant"):
with st.spinner("Thinking..."):
response = st.session_state.chat_engine.chat(prompt)
st.write(response.response)
message = {"role": "assistant", "content": response.response}
# Add response to message history
st.session_state.messages.append(message)

and this is where i defined my engine,

Attachment

are u using in a app or in jupyter?

In a fastapi app (where I was testing) it works fine

Plain Text

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.chat_engine import CondensePlusContextChatEngine

documents = SimpleDirectoryReader("./docs/docs/examples/data/paul_graham").load_data()

index = VectorStoreIndex.from_documents(documents)

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(),
)

from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.post("/chat")
async def root(request: Request):
    data = await request.json()
    message = data.get("message")
    response = await chat_engine.astream_chat(message)
    
    async def gen():
        async for chunk in response.async_response_gen():
            yield str(chunk)

    return StreamingResponse(gen())

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="127.0.0.1", port=8055)

Your streamlit code seems fine to me. Maybe add some prints to see what line it seems to be stuck on

i did already, i cheched, it stays in this line

Attachment

do you think for production i should use fastai? this streamlit doesnt looks best for production! what is your suggestion?

streamlit is definitely not a production app 😅 Its mostly to throw together a quick UI to show your boss hehe

ok, i am trying with what u said using flask app, tanks @Logan M u are great

hey @Logan M when i try the flask app uisng postman for my second question, this is what i get:

Attachment

That's pretty weird 👀 issue with openai embeddings I guess?

chat_engine = CondensePlusContextChatEngine.from_defaults(
retriever,
node_postprocessors = [colbert_reranker,llm_rerank_postprocessor],
) see this one i am using

i mean its different than what u used, u are calling this condence chat engine and passing index as retreiver, but my retriever i am building it myslef!

using QueryFusionRetriever

Right, but the console log shows openai embeddings isn't connecting

well the first quetion is always correct

Did you set your openai key? Are you manually setting up an embedding model?

yes, openai key is setup properly, for embedding i am using like this embed_model = OpenAIEmbedding(embed_batch_size=10)
Settings.embed_model = embed_model

Could just be openai having issues too

It's an APIConnectionError, I guess that means it had trouble hitting openai servers

but not always i am getting that, for example i tried one more time and i dont get any error in my postman consol, it shows its waiting for reposne

hey @Logan M i dont need to convert query to embediing and stuff right?

Nope it does that for you

i. assume when i am using this condence chat engine by default it will take the embedding of user query and then search the database, right?

ok tnk

do u have any tutorial for how can i add the source document part of the response to user? assume i am uisng condece query engine

you can get source nodes from the response, response.source_nodes

hey @Logan M i want to use the history of user for future chats that the user starts, similart to chatgpt, that maintain the user previous chats within user profile, do we have such a thing in llamaindex? if yes, can you navigate me to the right material pls?

yea thats just maintaing the memory. You can use a chat store or get/pass in the chat history and maintain it any way you want

Plain Text

chat_history = chat_engine.chat_history
chat_engine.chat("Hello", chat_history=chat_history)

chat stores
https://docs.llamaindex.ai/en/stable/module_guides/storing/chat_stores/

nice, well done. guys, u made our life so much easier 😄

hey @Logan M if i have for example 100 pdf documents and I can put them in meanigful categories, let say 5 diffent categories. I am thinking to put each category pdf files under one index and give the user buttons to select each category first and in backedn its releavnt index is trigered. my guess is this way as each knowledge base is specific then pulling information is easier for a RAG model, rather if i save all pdfs in one index flat! what are your thoughts

only flaw i see with this is its more convinient for user to just have one chatbot and ask anything they want, but i am worried if i make flat saving all pdfs at one index then it would be harder for RAG to pull the info

i think this is for me to use: https://docs.llamaindex.ai/en/stable/module_guides/indexing/metadata_extraction/

will this work with ReAct agent also?

did you try with ReAct agent?

@Logan M and also how the token limit is handled , for example , my token limit is 300, first question-answer take 100 token, 2nd 100 and 3rd 100 token, when i ask the 4th question and it take 100 token, will the 1st question answer be removed from the memory to store the 4th question answer ?

thats kind of how the memory buffer works yes. Ther's also a summary memory buffer. Or you can make your own memory

thank you @Logan M , will this work with ReAct agent also?

yea it will

Thank you

@Logan M after updating the chat_store redis packages i am geting this error "File "chat_store=RedisChatStore(redis_url=settings.REDIS_URL),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
packages/llama_index/storage/chat_store/redis/base.py", line 54, in init
self._aredis_client = aredis_client or self._aget_client(redis_url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File packages/llama_index/storage/chat_store/redis/base.py", line 374, in _aget_client
if self._check_for_cluster(redis_client):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File packages/llama_index/storage/chat_store/redis/base.py", line 199, in _check_for_cluster
return cluster_info["cluster_enabled"] == 1
~~~~^^^^^^^^^^^^^^^^^^^
TypeError: 'coroutine' object is not subscriptable

class Chat:
def init(self, model):
self.model = model

if model.id is None:
self.id = str(uuid.uuid4())
else:
self.id = model.id

if settings.REDIS_URL is not None:
self.memory = ChatMemoryBuffer.from_defaults(
token_limit=3900,
chat_store= RedisChatStore(redis_url=settings.REDIS_URL),
chat_storekey="memory" + self.id,
)
else:
self.memory = ChatMemoryBuffer.from_defaults(
token_limit=3900, chat_storekey="memory" + self.id
)

self.created = datetime.datetime.now()

this is the code

seems like a bug 🤷 Will have to fix it

Thank you @Logan M

redis = "5.1.0"
llama-index-storage-chat-store-redis = "0.3.1"
llama-index-storage-docstore-redis = "0.2.0"
llama-index-storage-index-store-redis = "0.3.0"
llama-index-vector-stores-redis = "0.3.2"