----------------------
You can limit the number of tokens retrieved by setting the
max_tokens
parameter when defining your Language Learning Model (LLM). This parameter is available for OpenAI, Cohere, and AI21 models. Here's an example of how to do it:
from llama_index import (
KeywordTableIndex,
SimpleDirectoryReader,
ServiceContext
)
from llama_index.llms import OpenAI
documents = SimpleDirectoryReader('data').load_data()
# define LLM
llm = OpenAI(temperature=0, model="text-davinci-002", max_tokens=512)
service_context = ServiceContext.from_defaults(llm=llm)
In this example, the
max_tokens
parameter is set to 512, which means the output from the LLM will be limited to 512 tokens.
You can also control the number of maximum output from the LLM by setting the
num_output
parameter in the
ServiceContext
. This parameter does not actually limit the model output, but it affects the amount of “space” we save for the output, when computing available context window size for packing text from retrieved Nodes.
from langchain.llms import OpenAI
from llama_index import ServiceContext, LLMPredictor, OpenAIEmbedding, PromptHelper
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser
llm_predictor = LLMPredictor(llm=OpenAI(model_name='text-davinci-003', temperature=0, max_tokens=256))
embed_model = OpenAIEmbedding()
node_parser = SimpleNodeParser(text_splitter=TokenTextSplitter(chunk_size=1024, chunk_overlap=20))
prompt_helper = PromptHelper(context_window=4096, num_output=256, chunk_overlap_ratio=0.1, chunk_size_limit=None)
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
embed_model=embed_model,
node_parser=node_parser,
prompt_helper=prompt_helper
)
In this example, the
num_output
parameter is set to 256.