Does PromptHelper change the amount of LLM calls?
I currently have
prompt_helper = PromptHelper(
context_window=8192,
num_output=1,
chunk_overlap_ratio=0.1,
chunk_size_limit=300,
)
connected to Mistral7B, but this is my outputs:
llama_print_timings: load time = 521.48 ms
...
llama_print_timings: total time = 1137.07 ms
Llama.generate: prefix-match hit
llama_print_timings: load time = 521.48 ms
...
llama_print_timings: total time = 3449.42 ms
Llama.generate: prefix-match hit
llama_print_timings: load time = 521.48 ms
...
llama_print_timings: total time = 3452.64 ms
Llama.generate: prefix-match hit
llama_print_timings: load time = 521.48 ms
...
llama_print_timings: total time = 5615.29 ms
Llama.generate: prefix-match hit
llama_print_timings: load time = 521.48 ms
llama_print_timings: sample time = 32.55 ms / 251 runs ( 0.13 ms per token, 7712.40 tokens per second)
llama_print_timings: prompt eval time = 242.42 ms / 248 tokens ( 0.98 ms per token, 1023.00 tokens per second)
llama_print_timings: eval time = 7252.34 ms / 250 runs ( 29.01 ms per token, 34.47 tokens per second)
llama_print_timings: total time = 7947.15 ms
does'nt this mean the LLM is being called 5 times?