chunk_size_limit=512
in the service context so that your document is broken into more nodes.max_input_size
is the max input size to the LLM (usually 4096 for openAI models). num_output
is the number of expected output tokens (256 by default for openAI https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-changing-the-number-of-output-tokens-for-openai-cohere-ai21). Every prompt sent to the LLM is setup so that there is room for these tokens, since GPT decoder-like models generate tokens one at a time, while appending them to the original inputchunk_size_limit
is used to set the size of chunks llama index breaks documents into. By default, it is 4000. However at query time, they may be broken even smaller to make sure there is room for num_output
in the promptindex.query(...., similarity_top_k=3)
-- this will fetch the top 3 nodes that best match the query. You might also be interested in using compact mode to decrease how long the query takes index.query(..., response_mode="compact")
-- this will stuff as much text as possible in one LLM callindex.query()
will not remember past questions. Llama index is meant to be more of a search tool rather than a chatbot. Although there are some super cool ways to integrate llama index as a tool inside langchain:index.query(...., similarity_top_k=3)
but same issue. I think the problem is in the indexing. I'm not sure but I feel like the indexed document is always 1 node (the whole doc). len(response.source_nodes)
always returns 1 as well. Do you know what might cause this?def parse_pdf(file: BytesIO):
pdf = PdfReader(file)
text_list = []
# Get the number of pages in the PDF document
num_pages = len(pdf.pages)
# Iterate over every page
for page in range(num_pages):
# Extract the text from the page
page_text = pdf.pages[page].extract_text()
text_list.append(page_text)
text = "\n".join(text_list)
return Document(text)
docs = []
with open(pdf_file, 'rb') as f:
fs = f.read()
docs.append(parse_pdf(BytesIO(fs)))
docs = [] # I only have one doc in docs
with open(pdf_file, 'rb') as f:
fs = f.read()
docs.append(parse_pdf(BytesIO(fs)))
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_outputs = 256
# set maximum chunk overlap
max_chunk_overlap = 20
# set chunk size limit
chunk_size_limit = 512
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
index = GPTSimpleVectorIndex.from_documents(docs, service_context=service_context)
ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, chunk_size_limit=512)