Find answers from the community

Updated last year

Does PromptHelper change the amount of

Does PromptHelper change the amount of LLM calls?
I currently have
Plain Text
prompt_helper = PromptHelper(
    context_window=8192,
    num_output=1,
    chunk_overlap_ratio=0.1,
    chunk_size_limit=300,
)

connected to Mistral7B, but this is my outputs:
Plain Text
llama_print_timings:        load time =     521.48 ms
...
llama_print_timings:       total time =    1137.07 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =     521.48 ms
...
llama_print_timings:       total time =    3449.42 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =     521.48 ms
...
llama_print_timings:       total time =    3452.64 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =     521.48 ms
...
llama_print_timings:       total time =    5615.29 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =     521.48 ms
llama_print_timings:      sample time =      32.55 ms /   251 runs   (    0.13 ms per token,  7712.40 tokens per second)
llama_print_timings: prompt eval time =     242.42 ms /   248 tokens (    0.98 ms per token,  1023.00 tokens per second)
llama_print_timings:        eval time =    7252.34 ms /   250 runs   (   29.01 ms per token,    34.47 tokens per second)
llama_print_timings:       total time =    7947.15 ms


does'nt this mean the LLM is being called 5 times?
L
H
4 comments
it sure does

You're basically saying "Ok, i retrieved X text chunks, but only send them in chunks of 300. And also, only worry about leaving room for a single output token"

Probably not what you intended?
tbh I wouldn't even touch the prompt helper
just set things in the service context directly

Plain Text
service_context = ServiceContext.from_defaults(..., num_outputs=256, context_window=8192, chunk_size=1024)
Oh yeah, definitrly not what I had in mind. Thanks so much!
Add a reply
Sign up and join the conversation on Discord