Find answers from the community

Updated last year

I m having trouble finding the list of I

I'm having trouble finding the list of (I'm going to call them:) "levers" llama provides for chat_history. Like, how much history is used... which parts of history are used (such as... sentence similarity possibly? πŸ€·β€β™‚οΈ)... how long ago and/or how many tokens ago do I start forgetting things... etc. -- just, what functions/features/etc are provided that I can leverage (🀭) to reduce/limit/optimize token usage costs.
L
R
26 comments
The chat history right now is super basic. It's just a list of messages πŸ˜…

Working on better "memory" abstractions though! Should be ready Soon ℒ️
So what happens when the history is >16k tokens?
And... is there a hook/callback or something we can use to filter-through/limit the history ourselves? Or no?
you can control the chat history pretty easily, as it's just a list of ChatMessage objects

On the chat engine, you can access it directly using chat_engine._chat_history, and same with the agent -> agent._chat_history
These are all pretty new -- the new memory objects we have in the pipeline should make this easier. But handling this manually is also not too bad
editing variables that start with _ just... feels wrong... but okay lol
i know i know, it's hacky lol
How is the chat history meant to be maintained between requests? Is that something we just, deal with however we see fit? orr?
Both agent.chat and chat_engine.chat allow you to pass in chat_history as a kwarg

Plain Text
agent.chat("Hello!", chat_history=[ChatMessage(role="assistant", content="text")])
Is there a page in the documentation that talks about that? Or not yet?
if you pass in a variable with a list, it will get appened to with new history
Not yet, I'm reading source code right now
which tbh I recommend doing as well
okay so, it just expects an array of that object -- and I'm assuming that is the same thing that is in _chat_history?
we are having a larger agent publicization/push I think next week? So hopefully better docs by then
I assume you skipped this question because you don't know? haha
it will just crash/traceback haha is my best guesss
like I said, baby steps here πŸ˜…

If you are inclined to make any PRs for this as well, I definitely welcome it. Community help is extremely appreciated πŸ™
Yeah I get it. Just making sure I understand the current behavior πŸ™‚
Alright...
Plain Text
    store = MongoDBAtlasVectorSearch(get_db(), db_name=config["db_name"],collection_name=config["collection_name"], index_name=config["index_name"])
    index = VectorStoreIndex.from_vector_store(vector_store=store)
    service_context = ServiceContext.from_defaults(llm=OpenAI(temperature=config["temperature"], model=config["model_name"]), num_output=config["num_output"])
    chat_engine = index.as_chat_engine(
        node_postprocessors=[SentenceEmbeddingOptimizer(threshold_cutoff=config["threshold_cutoff"],percentile_cutoff=config["percentile_cutoff"])],
        retriever_mode="embedding",
        service_context = service_context,
        similarity_top_k=config["similarity_top_k"],
        text_qa_template=qa_template,
        streaming=True,
        condense_question_prompt=custom_prompt,
    )
    streaming_response = chat_engine.stream_chat(prompt, chat_history=modified_chat_history)
Plain Text
ValueError: Streaming is not enabled. Please use chat() instead.
How am I supposed to set it up for streaming properly if streaming=True is insufficient? πŸ€”
(btw I think I'm still on 7.4-ish if that matters)
whoops... wrong thread..
Add a reply
Sign up and join the conversation on Discord