Does anyone have any insight on mmr vs standard retrieval? For context, I have a large set of data files that I create my index with. The index uses a tree_summarize response mode with the text_qa_prompt, and some custom prompt engineering added in (It's a lot of rough data that overlaps, this keeps the data in without me having to manually scrub). The problem I have is the response time from my server, back to my application takes to long and i'm trying to reduce my query time.
Does anyone have any metrics on external sources for storing documents and indexes? I've noticed that the StorageContext is quite slow to load (~53 seconds). My current setup is a wordpress plugin that communicates with my py files on an independent server. I run a custom py server to handle communication between languages with a pool of handlers. Normally the server is always running so I don't have an issue as I cache my index and I end up with a 3-4 second response time in my chat window. On the rare occassion that I have to restart the server for an update then I have to wait for the StorageContext to load before my index is loaded.
Ideally i'd rather not thread the storage context as most of my process are built to be autonomous and in the event of a server crash a consumer may end up restarting the server when they start a chat, which was intentionally engineered this way, but i'd rather move to a separate storage method rather than my current local setup.