QUESTION that arises from the the examples in
ServiceContext
docs (Key Components > Customization > ServiceContext) about kwargs:
#1
LLM(model=text-davinci-003, max_tokens=256)
SimpleNodeParser(chunk_size=1024, chunk_overlap=20)
PromptHelper(context_window=4096, num_output=256, chunk_overlap_ratio=0.1, chunk_size_limit=None)
no chunk size in ServiceContext
#2
LLM(model=gpt-3.5-turbo, max_tokens not defined)
SimpleNodeParser & PromptHelper not defined
ServiceContext(chunk_size=512)
The confusion:
- Both models have the same max token window of 4096 (Β± 1 token), which is defined in 1) but but in 2), why?
- #2 didn't define NodeParsing but i guess
ServiceContext(chunk_size=512)
passes this over to default node parser which is like doing SimpleNodeParser(chunk_size=512, chunk_overlap=0)
, am I wrong? - Please help me understand the difference in #1 between
LLM(max_tokens=256)
& PromptHelper(num_output=256)
, docs say Number of outputs for the LLM.
or set number of output tokens
somewhere else, but I dont understand what this means for real. Does this define the length of the final answer? - I already chunked the nodes and only use a saved index from disc, is the splitter in that phase meant for the users input/question pre embeddings?