3.5GB memory [100% in Azure my setup] usage when VectorIndex connected to Qdrant with just 3,333 chunks and only 1 user query. Is this expected? This is my code, i inserted the chunks in my local machine, deployed the python code in Azure and was just retrieving based on metadata filters. Am baffled why the memory goes upto 100%
Only difference in this query pipeline is i am NOT using ChatMemoryBuffer and my Colbert reranker is part of my source code (within the container itself)
The only thing I'm seeing that uses memory is the reranker. Unless you have enable_hybrid=True om qdrant, that will also run a local model and eat memory
Llm is LLama 3 with Groq, and hybrid = false for Qdrant. But Colbert gets downloaded as model.safetensors anyways right? So i assumed why dowload everytime, especially with gunicorn multi workers. If you say that I don't need azure table docstore, then do I need to release memory somehow? any new node retrieved due to different query seems to be staying in the memory?
Thanks a bunch @Logan M, I completely removed Colbert and it works like a charm now. Will need to look for paid api for Colbert, but that's a different pain. Rally appreciate you're help in this regard.