Hi everyone, I have seen condensed

AAlish Satani

Hi everyone, I have seen condensed question + context mode for the chat engine in llama docs, but it's only available for OpenAI.
Can somebody suggest if it's achievable for anthropic LLM and llama index?

I'm not using embeddings or any other vector stores as of now, as I'm new to LLMs and have built a lot of basic stuff without adding any complexity. But I would love to have some new suggestions and learnings.

21 comments

WWhiteFang_Jr

You can use any llm of your choice in llama-index.

In this particular case, you need to pass in your llm or define it globally so that it gets picked for any llm ops ( my preferred way)

Plain Text

from llama_index.core import Settings
llm = Anthropic instance

Settings.llm = llm # your defined llm

Now it will use your defined instance only

AAlish Satani

Plain Text

from llama_index.core.settings import Settings
from llama_index.core.chat_engine import CondenseQuestionChatEngine
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.query_engine import BaseQueryEngine
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

from app.engine.templates.system_prompt import get_system_prompt_template
from app.engine.settings import init_settings
from app.engine.memory.multi_chat_memory import MultiChatMemoryBuffer
from app.core.config import settings as app_settings
from app.engine.multi_chat_engine import MultiChatEngine


# build index
index = VectorStoreIndex.from_documents([])

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer(
    response_mode="tree_summarize",
)

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)


def create_chat_engine(chat_store):
    init_settings()

    memory = MultiChatMemoryBuffer(
        chat_store=chat_store,
        token_limit=app_settings.MAX_TOKEN,
    )

    return CondenseQuestionChatEngine.from_defaults(
        query_engine=query_engine,
        memory=memory,
        llm=Settings.llm,
        system_prompt=get_system_prompt_template().format(),
    )

Attachment

AAlish Satani

It always asks for Open AI key

WWhiteFang_Jr

If you look closely it is asking key for embed model and not for llm.
Also looking at your data, why do you want to create an index if you just want to interact with the llm? Any reason there?

AAlish Satani

no reasons for using index, I just thought it works that way

WWhiteFang_Jr

if you just want to interact with the llm and want it to remember the chat history. you can do it with

Plain Text

resp = llm.chat("hey")

This way it will remember you history as well

AAlish Satani

I would love your suggestions, Can you explain what the things I'm doing wrong?

AAlish Satani

Do I need to use chat engine or nor?

WWhiteFang_Jr

No if you jsut want to interact with llm only, try the one that i shared. It will do most of the things for you like manage chat history just like chat engine.

https://docs.llamaindex.ai/en/stable/examples/llm/anthropic/

AAlish Satani

Okay, so what's the use case of a chat engine?

WWhiteFang_Jr

Chat engine will be more useful for you if you have some data with you like your docs and you wanna interact with your docs along with making sure that it remembers your chat history as well.

WWhiteFang_Jr

Chat engine combines Query engine + conversational capabilities of a llm

AAlish Satani

I have tried this llm.astream_chat method just now, but didn't find options like condensed question mode

WWhiteFang_Jr

Chat mode will only work if you define a chat engine.

AAlish Satani

That's the point, currently it's using too many tokens due to history messages, so I want to use condense question mode

AAlish Satani

How can I use embedding model with it, I can use voyage for now

AAlish Satani

Condensed question mode is only available in chat engine and it requires query engine

WWhiteFang_Jr

You'll need to define your embed_model just like your llm globally using settings and then you can use chat engine

WWhiteFang_Jr

First you'll need to isntall the package: https://llamahub.ai/l/embeddings/llama-index-embeddings-voyageai?from=

AAlish Satani

What about query engine? CondensedQuestionChatEngine needs query engine as argument

WWhiteFang_Jr

Just do, chat_engine = index.as_chat_engine(pass all the fields and chat mode too )
it will create the everything on its own

Add a reply

Find answers from the community

Hi everyone, I have seen condensed