sounds like you are using a local model? They allocate up to a point. I've never had memory issues, but you need to configure some settings properly (batch size, llm context window, using an actual vector store if you have a lot of data, etc.)
it stuck at this point unless wait for more then 1-2min i am not sure about time but it do clean after bit idle state if keep spam it fallback to cpu and hence it go very slow