# For Jerry jerry_query_engine = index.as_query_engine( filters=MetadataFilters( filters=[ ExactMatchFilter( key="user", value="Jerry", ) ] ), similarity_top_k=3, ) # For Ravi ravi_query_engine = index.as_query_engine( filters=MetadataFilters( filters=[ ExactMatchFilter( key="user", value="Ravi", ) ] ), similarity_top_k=3, )
index.as_query_engine
is used to create a query engine for each user with a filter that matches the user's name. The ExactMatchFilter
is used to filter the data based on the user's name. ``
In the above code,
ContextChatEngine is used to create a chat engine that uses a retriever to retrieve a context, set the context in the system prompt, and then uses an LLM to generate a response, for a fluid chat experience.
Please note that you need to replace
LLM() and
BaseMemory() with the actual initialization of your LLM and memory. The
prefix_messages is a list of
ChatMessage` objects that you want to start the conversation with.# Initialize a dictionary to store chat engines chat_engines = {} # When a new user starts a conversation user_id = "unique_user_id" # Replace with actual unique user ID chat_engines[user_id] = index.as_chat_engine(chat_mode="context") # To handle a user's message def handle_message(user_id, message): chat_engine = chat_engines[user_id] response = chat_engine.chat(message) return response
handle_message
is a function that takes a user ID and a message, retrieves the corresponding chat engine, and uses it to generate a response.multiprocessing
module in Python, which allows for the creation of separate processes, each with its own Python interpreter. This can help you handle multiple requests concurrently.BaseManager
python package to provide sequential access to the index using a separate server and locks. This can help handle concurrent uploads and ensure sequential inserts into the index.index_server.py
might look like:import os from multiprocessing import Lock from multiprocessing.managers import BaseManager from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, Document os.environ["OPENAI_API_KEY"] = "your key here" index = None lock = Lock() def initialize_index(): global index with lock: # same as before ... pass def query_index(query_text): global index query_engine = index.as_query_engine() response = query_engine.query(query_text) return str(response) if __name__ == "__main__": # init the global index print("initializing index...") initialize_index()
Lock
is used to ensure that only one process can access the index
at a time, preventing race conditions. The initialize_index
and query_index
functions are defined to be used with the BaseManager
, which will handle the concurrent requests.asyncio
library. async
keyword to define your endpoint functions. This will allow FastAPI to handle multiple requests concurrently using a single thread, without blocking the server.from fastapi import FastAPI from llama_index.core.query_engine import QueryEngine app = FastAPI() query_engine = QueryEngine() # Initialize your query engine @app.get("/query") async def query_index(query: str): response = query_engine.query(query) return response
query_index
function is defined with the async
keyword, which means it's an asynchronous function. When this function is called, it will return a coroutine that FastAPI can run concurrently with other coroutines, allowing it to handle multiple requests at the same time.