chat_engine = index.as_chat_engine()
stream_chat
method of the chat engine to get a streaming response:streaming_response = chat_engine.stream_chat("Tell me a joke.")
stream_chat
method returns a generator that you can iterate over to get the response tokens:for token in streaming_response.response_gen: print(token, end="")
StreamingResponse
class to send a streaming response to the client. Here's an example of how you can do this:from fastapi import FastAPI from fastapi.responses import StreamingResponse app = FastAPI() @app.get("/stream_chat") def stream_chat(): chat_engine = index.as_chat_engine() streaming_response = chat_engine.stream_chat("Tell me a joke.") return StreamingResponse(streaming_response.response_gen)
/stream_chat
endpoint, FastAPI will send a streaming response to the client with the chat engine's response.from llama_index.core.memory import ChatMemoryBuffer # Set a smaller token limit memory = ChatMemoryBuffer.from_defaults(token_limit=1500) chat_engine = index.as_chat_engine( chat_mode="context", memory=memory, system_prompt=( "You are a chatbot, able to have normal interactions, as well as talk" " about an essay discussing Paul Grahams life." ), )
def handle_message(message): # Check if the message is a simple greeting if message.lower() == "hello": return "Hello!" # If not, use the chat engine to generate a response else: response = chat_engine.chat(message) return response
handle_message
function checks if the incoming message is "hello". If it is, it immediately returns "Hello!". If not, it uses the chat engine to generate a response.VectorStoreIndex
or if you're using a SummaryIndex
with the use_async
parameter set to True
.SummaryIndex
with a local LLM:from llama_index.llms.llamacpp import Llamacpp from llama_index.core import Settings from llama_index.core import SummaryIndex # define LLM llm = Llamacpp(model="llama2", request_timeout=60.0) Settings.llm = llm # Define Summary Index summary_index = SummaryIndex(nodes, use_async=False)
SummaryIndex
is created with the use_async
parameter set to False
. This means that the index will use the LLM defined in Settings.llm
(in this case, llamacpp
) to generate summaries synchronously.VectorStoreIndex
, you might also need to define a local embedding model in your settings. Here's an example:from llama_index.embeddings.llamacpp import LlamacppEmbedding # define embedding model embed_model = LlamacppEmbedding(model="llama2") Settings.embed_model = embed_model
LlamacppEmbedding
model is defined and set in Settings.embed_model
.from llama_index.core import get_response_synthesizer # Create a response synthesizer response_synthesizer = get_response_synthesizer(response_mode="compact") # Create a query engine with the response synthesizer query_engine = summary_index.as_query_engine(response_synthesizer=response_synthesizer)
get_response_synthesizer
function, using the "compact" response mode. This response synthesizer is then passed to the as_query_engine
method of the summary index when creating a query engine.VectorStoreIndex
or if you're using a SummaryIndex
with the use_async
parameter set to True
.VectorStoreIndex
, you might need to define a local embedding model in your settings. However, based on the provided extracts, there's no information about using a local embedding model with LlamaIndex.SummaryIndex
with use_async=True
, it will use OpenAI's API for asynchronous operations, which requires an API key. If you don't want to use OpenAI's API, you can set use_async=False
when creating the SummaryIndex
.SummaryIndex
with a local LLM:from llama_index.llms.llamacpp import Llamacpp from llama_index.core import Settings from llama_index.core import DocumentSummaryIndex from llama_index.core.node_parser import SentenceSplitter from llama_index.core import get_response_synthesizer # define LLM llm = Llamacpp(model="llama2", request_timeout=60.0) Settings.llm = llm # Create a response synthesizer response_synthesizer = get_response_synthesizer(response_mode="compact") # Define Sentence Splitter splitter = SentenceSplitter(chunk_size=1024) # Define Document Summary Index doc_summary_index = DocumentSummaryIndex.from_documents( city_docs, llm=llm, transformations=[splitter], response_synthesizer=response_synthesizer, show_progress=True, use_async=False )
DocumentSummaryIndex
is created with the use_async
parameter set to False
. This means that the index will use the LLM defined in Settings.llm
(in this case, llamacpp
) to generate summaries synchronously.use_async=True
parameter when creating the Document Summary Index. This will enable asynchronous operations, which can take advantage of multiple cores.from llama_index.core import DocumentSummaryIndex from llama_index.core.node_parser import SentenceSplitter from llama_index.core import get_response_synthesizer from llama_index.llms.llamacpp import Llamacpp from llama_index.core import Settings # define LLM llm = Llamacpp(model="llama2", request_timeout=60.0) Settings.llm = llm # Create a response synthesizer response_synthesizer = get_response_synthesizer(response_mode="compact") # Define Sentence Splitter splitter = SentenceSplitter(chunk_size=1024) # Define Document Summary Index with use_async=True doc_summary_index = DocumentSummaryIndex.from_documents( city_docs, llm=llm, transformations=[splitter], response_synthesizer=response_synthesizer, show_progress=True, use_async=True )
DocumentSummaryIndex
is created with the use_async
parameter set to True
. This means that the index will use asynchronous operations to generate summaries, which can be faster if you have multiple cores.use_async=False
).from llama_index.core import get_response_synthesizer # Create a response synthesizer with a different response mode response_synthesizer = get_response_synthesizer(response_mode="tree_summarize")
from llama_index.core import get_response_synthesizer # Create a response synthesizer with the "compact" response mode response_synthesizer = get_response_synthesizer(response_mode="compact")
SimpleDirectoryReader
to load documents from a directory and automatically set the doc_id
to be the full path to each document. You can do this by setting the filename_as_id=True
parameter when creating the SimpleDirectoryReader
. Here's an example:from llama_index.core import SimpleDirectoryReader # Load documents from a directory and set the doc_id to be the filename documents = SimpleDirectoryReader("./data", filename_as_id=True).load_data()
SimpleDirectoryReader
is created with the filename_as_id=True
parameter. This means that the doc_id
for each document will be set to its filename.DocumentSummaryIndex.from_documents
method to create a Document Summary Index:from llama_index.core import DocumentSummaryIndex from llama_index.llms.llamacpp import Llamacpp from llama_index.core.node_parser import SentenceSplitter from llama_index.core import get_response_synthesizer from llama_index.core import Settings # define LLM llm = Llamacpp(model="llama2", request_timeout=60.0) Settings.llm = llm # Create a response synthesizer response_synthesizer = get_response_synthesizer(response_mode="compact") # Define Sentence Splitter splitter = SentenceSplitter(chunk_size=1024) # Create a Document Summary Index doc_summary_index = DocumentSummaryIndex.from_documents( documents, llm=llm, transformations=[splitter], response_synthesizer=response_synthesizer, show_progress=True, use_async=False )
MetadataPack
provided by LlamaIndex. This allows you to add metadata to each document, which can then be used to filter or sort the documents during retrieval.from llama_index.core import MetadataPack # Create a MetadataPack metadata_pack = MetadataPack() # Add metadata to your documents for doc in documents: metadata_pack.add_metadata(doc, {"author": "John Doe", "year": 2021})
MetadataPack
is used to add metadata to each document. The metadata is a dictionary that can contain any key-value pairs you want.retrieve
method of your index. Here's an example:# Retrieve documents with metadata matching the filter retrieved_docs = doc_summary_index.retrieve("What are the sports teams in Toronto?", metadata_filter={"author": "John Doe"})
retrieve
method is called with a metadata filter that only matches documents with an "author" metadata field equal to "John Doe".MetadataExtractor
provided by LlamaIndex. This allows you to automatically extract metadata from each document based on certain rules or patterns.MetadataExtractor
:from llama_index.core import MetadataExtractor # Create a MetadataExtractor metadata_extractor = MetadataExtractor() # Define a rule to extract metadata rule = {"pattern": r"Author: (.*)", "metadata_field": "author"} # Add the rule to the MetadataExtractor metadata_extractor.add_rule(rule) # Extract metadata from your documents for doc in documents: metadata = metadata_extractor.extract(doc) doc.metadata.update(metadata)
MetadataExtractor
is used to extract metadata from each document. The rule is a dictionary that defines a pattern to match in the document text and a metadata field to store the matched text. The extract
method of the MetadataExtractor
returns a dictionary of extracted metadata, which is then added to the document's metadata.retrieve
method of your index.MetadataExtractor
class does not exist in the LlamaIndex library. Instead, LlamaIndex provides several specific metadata extractors, such as SummaryExtractor
, QuestionsAnsweredExtractor
, TitleExtractor
, KeywordExtractor
, and EntityExtractor
.from llama_index.core.extractors import ( SummaryExtractor, QuestionsAnsweredExtractor, TitleExtractor, KeywordExtractor, ) from llama_index.extractors.entity import EntityExtractor from llama_index.core.node_parser import TokenTextSplitter from llama_index.llms.openai import OpenAI from llama_index.core.schema import MetadataMode # Define LLM llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo", max_tokens=512) # Define Node Parser text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=128) # Define Metadata Extractors extractors = [ TitleExtractor(nodes=5, llm=llm), QuestionsAnsweredExtractor(questions=3, llm=llm, metadata_mode=MetadataMode.EMBED), SummaryExtractor(summaries=["prev", "self", "next"], llm=llm), KeywordExtractor(keywords=10, llm=llm), EntityExtractor(prediction_threshold=0.5), ] # Add extractors to transformations transformations = [text_splitter] + extractors
SimpleDirectoryReader
to load documents, the doc_id
is automatically set to the full path of each document. If you want to set the doc_id
to just the filename, you can do so by modifying the doc_id
after loading the documents.from llama_index.core import SimpleDirectoryReader from pathlib import Path # Load documents from a directory documents = SimpleDirectoryReader("./data").load_data() # Set the doc_id to be the filename for doc in documents: doc.doc_id = Path(doc.doc_id).name
SimpleDirectoryReader
is used to load the documents, and then the doc_id
for each document is set to its filename using the Path
class from the pathlib
module.stem
property of the Path
object. Here's how you can do it:from llama_index.core import SimpleDirectoryReader from pathlib import Path # Load documents from a directory documents = SimpleDirectoryReader("./data").load_data() # Set the doc_id to be the filename without extension for doc in documents: doc.doc_id = Path(doc.doc_id).stem
SimpleDirectoryReader
is used to load the documents, and then the doc_id
for each document is set to its filename without extension using the Path
class from the pathlib
module.doc_id
is not getting updated with the filename, it could be due to a few reasons. Here are a few things you can check:SimpleDirectoryReader
is correct and that it points to the directory where your files are located.SimpleDirectoryReader
.doc_id
updating: Make sure the doc_id
is being updated correctly. You can do this by printing the doc_id
of each document after updating it.from llama_index.core import SimpleDirectoryReader from pathlib import Path # Load documents from a directory documents = SimpleDirectoryReader("./data").load_data() # Print the documents to check if they're loaded correctly for doc in documents: print(doc) # Set the doc_id to be the filename without extension for doc in documents: doc.doc_id = Path(doc.doc_id).stem # Print the doc_id of each document to check if it's updated correctly for doc in documents: print(doc.doc_id)
doc_id
of each document are printed to the console. This can help you identify any issues with the file loading or the doc_id
updating.chunk_size
when splitting your documents into chunks. This will result in larger chunks of text being passed to the LLM, which might help it to better understand the context. Here's an example:from llama_index.core.node_parser import SentenceSplitter # Define a Sentence Splitter with a larger chunk size splitter = SentenceSplitter(chunk_size=2048)
SentenceSplitter
is created with a chunk_size
of 2048. This means that each chunk will contain up to 2048 tokens.from llama_index.core import get_response_synthesizer # Create a response synthesizer with the "tree_summarize" response mode response_synthesizer = get_response_synthesizer(response_mode="tree_summarize")