from llama_index.core.chat_engine import SimpleChatEngine chat_engine = SimpleChatEngine.from_defaults() response = chat_engine.chat("Tell me a joke.") print(response)
SimpleChatEngine
and asking it to tell a joke. The chat
method sends a message to the chat engine and returns its response.LlamaCPP
for chat completion methods or stream_chat
methods, you first need to import the necessary modules and initialize the LlamaCPP
instance. Here's a basic example:from llama_index.core import SimpleDirectoryReader, VectorStoreIndex from llama_index.llms.llama_cpp import LlamaCPP from llama_index.llms.llama_cpp.llama_utils import ( messages_to_prompt, completion_to_prompt, ) # Initialize LlamaCPP llm = LlamaCPP()
complete
method:response = llm.complete("Hello! Can you tell me a poem about cats and dogs?") print(response.text)
stream_complete
method:response_iter = llm.stream_complete("Can you write me a poem about fast cars?") for response in response_iter: print(response.delta, end="", flush=True)
stream_chat
method:streamed_response = prem_chat.stream_chat(messages) for response_delta in streamed_response: print(response_delta.delta, end="")
prem_chat
with your LlamaCPP
instance and messages
with your list of chat messages.LlamaCPP
:%pip install llama-index-embeddings-huggingface %pip install llama-index-llms-llama-cpp
PremAI
(which we'll refer to as prem_chat
in this context) and messages
in LlamaIndex:PremAI
instance:from llama_index.llms.premai import PremAI from llama_index.core.llms import ChatMessage # Initialize PremAI prem_chat = PremAI(project_id=8, system_prompt="Act like nemo fish")
project_id
is the ID of your project on the PremAI platform, and system_prompt
is an optional system prompt that you can provide.messages = [ ChatMessage(role="user", content="What is your name"), ChatMessage(role="user", content="Write an essay about your school in 500 words"), ]
role
can be either "user" or "system", and content
is the content of the message.response = prem_chat.chat(messages) print(response)
stream_chat
method:streamed_response = prem_chat.stream_chat(messages) for response_delta in streamed_response: print(response_delta.delta, end="")
LlamaCPP
and a local model to create a simple chatbot:%pip install llama-index-embeddings-huggingface %pip install llama-index-llms-llama-cpp
from llama_index.core.llms import ChatMessage from llama_index.llms.llama_cpp import LlamaCPP
LlamaCPP
instance. In this example, we're using a model hosted on Hugging Face, but you can replace model_url
with the path to your local model file if you have one:model_url = "<https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"> llm = LlamaCPP( model_url=model_url, temperature=0.1, max_new_tokens=256, context_window=3900, model_kwargs={"n_gpu_layers": 1}, )
messages = [ ChatMessage(role="system", content="You are a helpful assistant."), ChatMessage(role="user", content="What's the weather like today?"), ]
response = llm.chat(messages) print(response)
ChatMessage
for use with LlamaCPP
in LlamaIndex, you can use the following line of code:from llama_index.core.llms import ChatMessage
ChatMessage
is a class that represents a single message in a conversation. It has two properties: role
and content
. The role
can be either "user" or "system", and content
is the content of the message.ChatMessage
:message = ChatMessage(role="user", content="Hello, world!")
messages
list. Here's an example:from llama_index.core.llms import ChatMessage from llama_index.llms.llama_cpp import LlamaCPP # Initialize LlamaCPP llm = LlamaCPP() # Start with a system message messages = [ChatMessage(role="system", content="You are a helpful assistant.")] while True: # Get user input user_input = input("User: ") # Exit the loop if the user types "exit" if user_input.lower() == "exit": break # Add the user's message to the conversation messages.append(ChatMessage(role="user", content=user_input)) # Get the model's response response = llm.chat(messages) # Print the model's response print(f"Bot: {response}") # Add the model's response to the conversation messages.append(ChatMessage(role="assistant", content=response))
messages
list, gets the model's response, prints the response, and adds the response to the messages
list. The loop continues until the user types "exit".LlamaCPP()
with the actual initialization of your LlamaCPP
instance, which includes specifying the model and other parameters.from llama_index.core.llms import ChatMessage from llama_index.llms.llama_cpp import LlamaCPP # Initialize LlamaCPP llm = LlamaCPP() # Start with a system message messages = [ChatMessage(role="system", content="You are a helpful assistant.")] while True: # Get user input user_input = input("User: ") # Exit the loop if the user types "exit" if user_input.lower() == "exit": break # Add the user's message to the conversation messages.append(ChatMessage(role="user", content=user_input)) # Get the model's response response = llm.chat(messages) # Print the model's response print(f"Bot: {response.message.content}") # Add the model's response to the conversation messages.append(ChatMessage(role="assistant", content=response.message.content))
messages
list before getting the model's response. The model's response is then added to the messages
list after it's printed. This way, the chat history is correctly maintained.LlamaCPP()
with the actual initialization of your LlamaCPP
instance, which includes specifying the model and other parameters.stream_chat
method of LlamaCPP
. Here's how you can modify the previous code to support streaming:from llama_index.core.llms import ChatMessage from llama_index.llms.llama_cpp import LlamaCPP # Initialize LlamaCPP llm = LlamaCPP() # Start with a system message messages = [ChatMessage(role="system", content="You are a helpful assistant.")] while True: # Get user input user_input = input("User: ") # Exit the loop if the user types "exit" if user_input.lower() == "exit": break # Add the user's message to the conversation messages.append(ChatMessage(role="user", content=user_input)) # Get the model's response response_stream = llm.stream_chat(messages) # Print the model's response for response in response_stream: print(f"Bot: {response.delta}", end="") # Add the model's response to the conversation messages.append(ChatMessage(role="assistant", content=response.message.content))
stream_chat
returns a generator that yields ChatResponseDelta
objects. Each ChatResponseDelta
object represents a part of the model's response. You can print each part as it's generated by iterating over the generator and printing the delta
property of each ChatResponseDelta
.LlamaCPP()
with the actual initialization of your LlamaCPP
instance, which includes specifying the model and other parameters.stream_chat
method returns a generator that yields ChatResponseDelta
objects, not ChatResponse
objects. Each ChatResponseDelta
object represents a part of the model's response and does not contain a message
property.messages
list, you can collect all the parts of the response and concatenate them. Here's how you can modify the previous code to do this:from llama_index.core.llms import ChatMessage from llama_index.llms.llama_cpp import LlamaCPP # Initialize LlamaCPP llm = LlamaCPP() # Start with a system message messages = [ChatMessage(role="system", content="You are a helpful assistant.")] while True: # Get user input user_input = input("User: ") # Exit the loop if the user types "exit" if user_input.lower() == "exit": break # Add the user's message to the conversation messages.append(ChatMessage(role="user", content=user_input)) # Get the model's response response_stream = llm.stream_chat(messages) # Collect all parts of the response response_parts = [response_delta.delta for response_delta in response_stream] # Concatenate the parts to get the full response full_response = "".join(response_parts) # Print the full response print(f"Bot: {full_response}") # Add the full response to the conversation messages.append(ChatMessage(role="assistant", content=full_response))
join
method to concatenate the parts and get the full response. We print the full response and add it to the messages
list.LlamaCPP()
with the actual initialization of your LlamaCPP
instance, which includes specifying the model and other parameters.from fastapi import FastAPI from llama_index.core.llms import ChatMessage from llama_index.llms.llama_cpp import LlamaCPP app = FastAPI() # Initialize LlamaCPP llm = LlamaCPP() # Start with a system message messages = [ChatMessage(role="system", content="You are a helpful assistant.")] @app.get("/chat/{user_input}") async def chat(user_input: str): # Add the user's message to the conversation messages.append(ChatMessage(role="user", content=user_input)) # Get the model's response response_stream = llm.stream_chat(messages) # Yield each part of the response for response_delta in response_stream: yield response_delta.delta # Collect all parts of the response response_parts = [response_delta.delta for response_delta in response_stream] # Concatenate the parts to get the full response full_response = "".join(response_parts) # Add the full response to the conversation messages.append(ChatMessage(role="assistant", content=full_response))
LlamaCPP()
with the actual initialization of your LlamaCPP
instance, which includes specifying the model and other parameters.[Inst]
output you're seeing is likely due to the system message not being formatted correctly for the CodeLlama model. The system message is used to provide instructions to the model about the role it should play in the conversation.from llama_index.core.llms import ChatMessage messages = [ ChatMessage(role="system", content="You are a coding assistant."), ChatMessage(role="user", content="How do I write a for loop in Python?"), ]