docker run --gpus all -p 8000:8000 ghcr.io/mistralai/mistral-src/vllm:latest \
--host 0.0.0.0 \
--model mistralai/Mistral-7B-Instruct-v0.2 \
--tensor-parallel-size 1
num_workers
(default is 4 ongoing concurrent requests)Time to synthesize: 211.95126565685496 Time to add to docstore: 0.6406434010714293 Time to embed: 2.62248971988447
insert
so that the the asynthesize and aembed methods are called on insert.TreeSummarize
response synthesizer with use_async=True
set it should be quite a bit faster