Find answers from the community

Updated 2 months ago

I am using local models for a Q&A RAG

I am using local models for a Q&A RAG pipeline. I am trying to use multiprocessing to get the best use of my resources. I am able to make it work but I have to load models for each process separately but that slows things down. I tried using multiprocessing queue to share the service_context among processes but got this error:
cannot pickle 'builtins.CoreBPE' object

Any advice?
L
A
5 comments
Local models have the limitation where you can't share across services like that

Solution is probably to host the model and send requests to a queue for processing
That is something I have not done before. Can you direct me to where I can find the documentation for the proposed solution? or libraries I can use?
Need to use a model server like vLLM or Huggingfaces text-generatation-interface
We have a vLLM integration. For text-generation-interface, you can use langchains LLM
Add a reply
Sign up and join the conversation on Discord