Find answers from the community

Updated 3 months ago

how do I limit the number of tokens/ or words retrieved

how do I limit the number of tokens/ or words retrieved
k
3 comments
Please stand by @Steve_h_112 as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:
----------------------
You can limit the number of tokens retrieved by setting the max_tokens parameter when defining your Language Learning Model (LLM). This parameter is available for OpenAI, Cohere, and AI21 models. Here's an example of how to do it:

Plain Text
from llama_index import (
    KeywordTableIndex,
    SimpleDirectoryReader,
    ServiceContext
)
from llama_index.llms import OpenAI

documents = SimpleDirectoryReader('data').load_data()

# define LLM
llm = OpenAI(temperature=0, model="text-davinci-002", max_tokens=512)
service_context = ServiceContext.from_defaults(llm=llm)


In this example, the max_tokens parameter is set to 512, which means the output from the LLM will be limited to 512 tokens.

You can also control the number of maximum output from the LLM by setting the num_output parameter in the ServiceContext. This parameter does not actually limit the model output, but it affects the amount of “space” we save for the output, when computing available context window size for packing text from retrieved Nodes.

Plain Text
from langchain.llms import OpenAI
from llama_index import ServiceContext, LLMPredictor, OpenAIEmbedding, PromptHelper
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser

llm_predictor = LLMPredictor(llm=OpenAI(model_name='text-davinci-003', temperature=0, max_tokens=256))
embed_model = OpenAIEmbedding()
node_parser = SimpleNodeParser(text_splitter=TokenTextSplitter(chunk_size=1024, chunk_overlap=20))
prompt_helper = PromptHelper(context_window=4096, num_output=256, chunk_overlap_ratio=0.1, chunk_size_limit=None)
service_context = ServiceContext.from_defaults(
  llm_predictor=llm_predictor,
  embed_model=embed_model,
  node_parser=node_parser,
  prompt_helper=prompt_helper
)


In this example, the num_output parameter is set to 256.
For more details, you can refer to the LlamaIndex documentation () and ServiceContext documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord