Find answers from the community

Updated 4 months ago

max_input = 4096 tokens = 512 chunk_size = 600 max_chunk_overlap = 0.2 pr

At a glance

max_input = 4096
tokens = 512
chunk_size = 600
max_chunk_overlap = 0.2
promptHelper = PromptHelper(max_input, tokens, max_chunk_overlap, chunk_size_limit=chunk_size)

# Define LLm
llmPredictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=tokens))

# Load data
docs = SimpleDirectoryReader(path).load_data()

# Create vector index
service_context = ServiceContext.from_defaults(llm_predictor=llmPredictor, prompt_helper=promptHelper)

vectorIndex = GPTVectorStoreIndex.from_documents(documents=docs, service_context=service_context)
vectorIndex.storage_context.persist(persist_dir='index_store')
return vectorIndex
query_engine = index.as_query_engine(response_mode='refine')

here what could be the maximum reponse size
k
3
L
25 comments
Please stand by @3bs as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The maximum response size is determined by the num_output parameter in the PromptHelper configuration. In your code, you've set tokens as 512, which is passed to both the LLMPredictor and PromptHelper. Therefore, the maximum response size would be 512 tokens.

Here's the relevant part of your code:

Plain Text
tokens = 512
...
llmPredictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=tokens))
...
promptHelper = PromptHelper(max_input, tokens, max_chunk_overlap, chunk_size_limit=chunk_size)


In this case, tokens is used for the num_output parameter of PromptHelper, which determines the maximum number of output tokens from the LLM (Language Model).

Please note that this parameter does not actually limit the model output, it affects the amount of “space” saved for the output when computing the available context window size for packing text from retrieved Nodes (source ()).
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai what is the maximum number i could put instead of 512
Let me check the knowledge base for you. One moment! :books:
----------------------
The maximum number you can set for num_output (or tokens in your code) depends on the specific Language Model (LLM) you are using. In your case, you're using the OpenAI model with the text-davinci-003 variant.

The num_output parameter is used to reserve a certain amount of "space" for the output when computing the available context window size for packing text from retrieved Nodes. It's important to note that this parameter does not actually limit the model output.

However, the documentation does not specify a maximum limit for this parameter. It's likely that the maximum value would be constrained by the maximum context window size of the LLM you're using, and the specific requirements of your application.

If you're experiencing issues or need to handle larger outputs, you might need to adjust other parameters such as context_window or chunk_size, or consider using a different LLM that can handle larger context sizes.

For more detailed information, you may want to refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai what is the maximum num_output for using the OpenAI model with the text-davinci-003 variant.
Let me check the knowledge base for you. One moment! :books:
----------------------
The maximum number of output tokens (num_output) for the OpenAI model with the text-davinci-003 variant is typically set to a low number by default. For instance, with OpenAI the default is 256 (source ()).

However, the documentation does not specify a maximum limit for this parameter. It's likely that the maximum value would be constrained by the maximum context window size of the LLM you're using, and the specific requirements of your application.

If you're experiencing issues or need to handle larger outputs, you might need to adjust other parameters such as context_window or chunk_size, or consider using a different LLM that can handle larger context sizes.

For more detailed information, you may want to refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
So every time you increase num_output/max_tokens, you are decreasing the size of the initial input.

This is because LLMs generate one token at a time, add that token to the input, and them generate the next
@Logan M i want to generate a text book how to do that ?
or at least i want to re write a text file of a book so at least the response size should be equal to the original text file ?
You'd have to do it in chunks.

I.e. feed some context at a time, and generate the next piece of the book

Tbh it will be a little difficult to generate something coherent this way though.

There is a maximum input size of 4096. Although gpt-3.5 has a 16K variant too.

But as I mentioned, the input and output are shared. If I make the output size 3000, that means only 1096 tokens can be in the input
@Logan M at least i want to re write a text file of a book so at least the response size should be equal to the original text file ?
Assuming that the LLM has a big enough context window to make that happen, then yes
@kapa.ai how to organize the response and outputs in a good shape not just as a paragraph
Let me check the knowledge base for you. One moment! :books:
----------------------
To organize the response and outputs in a structured format rather than a simple paragraph, you can use the output_parser module in LlamaIndex. This module allows you to format each prompt with output parser instructions and obtain a structured response.

Here's an example from the documentation ():

Plain Text
# format each prompt with output parser instructions
fmt_qa_tmpl = output_parser.format(DEFAULT_TEXT_QA_PROMPT_TMPL)
fmt_refine_tmpl = output_parser.format(DEFAULT_REFINE_PROMPT_TMPL)

qa_prompt = QuestionAnswerPrompt(fmt_qa_tmpl, output_parser=output_parser)
refine_prompt = RefinePrompt(fmt_refine_tmpl, output_parser=output_parser)

# obtain a structured response
query_engine = index.as_query_engine(
    service_context=ServiceContext.from_defaults(
        llm_predictor=llm_predictor
    ),
    text_qa_template=qa_prompt, 
    refine_template=refine_prompt, 
)
response = query_engine.query(
    "What are the three items the author did growing up?", 
)
print(response)


In this example, the response is structured as a dictionary:

Plain Text
{'points': [{'explanation': 'Writing short stories', 'explanation2': 'Programming on an IBM 1401', 'explanation3': 'Using microcomputers'}]}


You can also use the GuidancePydanticProgram to generate structured data with guidance via LlamaIndex. This is especially helpful when you are using lower-capacity models, which otherwise would struggle to generate valid output that fits the desired output schema. Here's an example from the documentation ():
Plain Text
from pydantic import BaseModel
from typing import List
from guidance.llms import OpenAI
from llama_index.program import GuidancePydanticProgram

# Define output schema
class Song(BaseModel):
    title: str
    length_seconds: int

class Album(BaseModel):
    name: str
    artist: str
    songs: List[Song]

# Define guidance pydantic program
program = GuidancePydanticProgram(
    output_cls=Album,
    prompt_template_str="Generate an example album, with an artist and a list of songs. Using the movie {{movie_name}} as inspiration",
    guidance_llm=OpenAI("text-davinci-003"),
    verbose=True,
)

# Run program to get structured output
output = program(movie_name="The Shining")


In this case, the output is a valid Pydantic object that you can then use to call functions/APIs.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai
def create_index(path):
max_input = 4096
tokens = 512
chunk_size = 600
max_chunk_overlap = 0.2
promptHelper = PromptHelper(max_input, tokens, max_chunk_overlap, chunk_size_limit=chunk_size)

# Define LLm
llmPredictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=tokens))

# Load data
docs = SimpleDirectoryReader(path).load_data()

# Create vector index
service_context = ServiceContext.from_defaults(llm_predictor=llmPredictor, prompt_helper=promptHelper)

vectorIndex = GPTVectorStoreIndex.from_documents(documents=docs, service_context=service_context)
vectorIndex.storage_context.persist(persist_dir='index_store')
return vectorIndex

def addToIndex(doc):
# Load the existing index from storage
storage_context = StorageContext.from_defaults(persist_dir='index_store')
index = load_index_from_storage(storage_context)
# Add the new document to the index
index.insert(doc)
# Save the updated index back to the storage
index.storage_context.persist(persist_dir='index_store')
return

def answerMe(question):

storage_context = StorageContext.from_defaults(persist_dir='index_store')
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine(response_mode='refine')
for sent in rewriting:
if sent in question :
qq = "Write a lengthy response to this query"
response = query_engine.query(qq + '\n\n'+ question)
return response
response = query_engine.query(question)
for phrase in phrases:
if phrase in str(response) :
content =agent({"input": question})
actualcontent = content['output']
response = re.sub(r'[(.?)]((.?))', r'<a href="\2" target="_blank">\1</a>', actualcontent)

return response

how to costume this to organize the response and outputs in a good shape not just as a paragraph
Let me check the knowledge base for you. One moment! :books:
----------------------
To organize the response and outputs in a structured format, you can use the GuidancePydanticProgram to generate structured data with guidance via LlamaIndex. This is especially helpful when you are using lower-capacity models, which otherwise would struggle to generate valid output that fits the desired output schema.

First, you need to define your output schema using Pydantic. Here's an example:

Plain Text
from pydantic import BaseModel
from typing import List

class Song(BaseModel):
    title: str
    length_seconds: int

class Album(BaseModel):
    name: str
    artist: str
    songs: List[Song]


Then, you can define a GuidancePydanticProgram with your output schema:

Plain Text
from guidance.llms import OpenAI
from llama_index.program import GuidancePydanticProgram

program = GuidancePydanticProgram(
    output_cls=Album,
    prompt_template_str="Generate an example album, with an artist and a list of songs. Using the movie {{movie_name}} as inspiration",
    guidance_llm=OpenAI("text-davinci-003"),
    verbose=True,
)


Finally, you can run the program to get structured output:

Plain Text
output = program(movie_name="The Shining")


In this case, the output is a valid Pydantic object that you can then use to call functions/APIs.

You can integrate this into your answerMe function to generate structured responses. Here's an example:
Plain Text
def answerMe(question):
    storage_context = StorageContext.from_defaults(persist_dir='index_store')
    index = load_index_from_storage(storage_context)
    query_engine = index.as_query_engine(response_mode='refine')
    for sent in rewriting:
        if sent in question :
            qq = "Write a lengthy response to this query"
            response = query_engine.query(qq + '\n\n'+ question)
            return response
    response = query_engine.query(question)
    for phrase in phrases:
        if phrase in str(response) :
            content =agent({"input": question})
            actualcontent = content['output']
            response = re.sub(r'\[(.*?)\]\((.*?)\)', r'<a href="\2" target="_blank">\1</a>', actualcontent)

    # Generate structured output
    output = program(movie_name=response)
    return output


Please note that you need to adjust the prompt_template_str and the arguments to the program call to match your specific use case.

For more detailed information, you may want to refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai but i dont have a static schema for the response or the files indexed
Let me check the knowledge base for you. One moment! :books:
@3bs That's a great question, but unfortunately I am only capable of responding to a maximum of three questions per thread. Don't hesitate to create a new thread for any further questions you have though. I'm here to help!
Add a reply
Sign up and join the conversation on Discord