Find answers from the community

Updated 3 months ago

Hi everyone, I have seen condensed

Hi everyone, I have seen condensed question + context mode for the chat engine in llama docs, but it's only available for OpenAI.
Can somebody suggest if it's achievable for anthropic LLM and llama index?

I'm not using embeddings or any other vector stores as of now, as I'm new to LLMs and have built a lot of basic stuff without adding any complexity. But I would love to have some new suggestions and learnings.
W
A
21 comments
You can use any llm of your choice in llama-index.

In this particular case, you need to pass in your llm or define it globally so that it gets picked for any llm ops ( my preferred way)

Plain Text
from llama_index.core import Settings
llm = Anthropic instance

Settings.llm = llm # your defined llm

Now it will use your defined instance only
Plain Text
from llama_index.core.settings import Settings
from llama_index.core.chat_engine import CondenseQuestionChatEngine
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.query_engine import BaseQueryEngine
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

from app.engine.templates.system_prompt import get_system_prompt_template
from app.engine.settings import init_settings
from app.engine.memory.multi_chat_memory import MultiChatMemoryBuffer
from app.core.config import settings as app_settings
from app.engine.multi_chat_engine import MultiChatEngine


# build index
index = VectorStoreIndex.from_documents([])

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    response_mode="tree_summarize",
)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)


def create_chat_engine(chat_store):
    init_settings()

    memory = MultiChatMemoryBuffer(
        chat_store=chat_store,
        token_limit=app_settings.MAX_TOKEN,
    )

    return CondenseQuestionChatEngine.from_defaults(
        query_engine=query_engine,
        memory=memory,
        llm=Settings.llm,
        system_prompt=get_system_prompt_template().format(),
    )
Attachment
Screenshot_from_2024-09-13_14-04-03.png
It always asks for Open AI key
If you look closely it is asking key for embed model and not for llm.
Also looking at your data, why do you want to create an index if you just want to interact with the llm? Any reason there?
no reasons for using index, I just thought it works that way
if you just want to interact with the llm and want it to remember the chat history. you can do it with
Plain Text
resp = llm.chat("hey")

This way it will remember you history as well
I would love your suggestions, Can you explain what the things I'm doing wrong?
Do I need to use chat engine or nor?
No if you jsut want to interact with llm only, try the one that i shared. It will do most of the things for you like manage chat history just like chat engine.

https://docs.llamaindex.ai/en/stable/examples/llm/anthropic/
Okay, so what's the use case of a chat engine?
Chat engine will be more useful for you if you have some data with you like your docs and you wanna interact with your docs along with making sure that it remembers your chat history as well.
Chat engine combines Query engine + conversational capabilities of a llm
I have tried this llm.astream_chat method just now, but didn't find options like condensed question mode
Chat mode will only work if you define a chat engine.
That's the point, currently it's using too many tokens due to history messages, so I want to use condense question mode
How can I use embedding model with it, I can use voyage for now
Condensed question mode is only available in chat engine and it requires query engine
You'll need to define your embed_model just like your llm globally using settings and then you can use chat engine
What about query engine? CondensedQuestionChatEngine needs query engine as argument
Just do, chat_engine = index.as_chat_engine(pass all the fields and chat mode too )
it will create the everything on its own
Add a reply
Sign up and join the conversation on Discord