LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

Division by zero

Division by zero

At a glance

The community member is trying to create a GPTSimpleVectorIndex for long documents, but is encountering various issues. They initially tried using a SimpleKeywordTableIndex over a tree index, but received an "integer division or modulo by zero" error. The community members discuss potential solutions, such as using a different parent index structure, a top-level vector index, or the non-simple keyword index.

The community members also discuss setting up the prompt helper and LLM predictor to handle larger documents, but encounter issues with the maximum context length of the embedding model. They try using the BaseEmbedding and OpenAIEmbedding classes, but still face problems with the context length and batch size. In the end, the community members suggest trying a GPTListIndex as a potential solution.

Useful resources

·

Hi all- I'm trying to compose a simplekeywordtableIndex over a tree index but when I queried the graph, it's giving me the error: integer divison or modulo by zero

L

M

79 comments

Do you have the full logs from the query?

I think this happens when the query doesn't have any keywords that matc the keyword index 🤔

I think so, the extracted keywords list is a empty list. However i asked the question strictly pertaining to what I have in the summary for that subindex. How should I prevent this from happening? Use a different parent index struct?

I was using List over Tree, the query is fine but I don't think it's optimized for comparison and contrast.

You have many sub-tree indexes right?

Keyword is usually fine, I'm surprised the query keyword list was empty (maybe your query was too short or generic?)

Yea a list index on top will check every sub index, so not always the best option. Maybe a top level vector index would be better?

It's surprising to me as well because I tested before using keyword as subindex and it usually works out pretty well if my query has words that i know show up in the document. I tried using the structure Vector -> Keyword -> Tree as shown in the guide to creating a unified query framework over your indexes page.

However, i'm having difficulty constructing subindices for documents that are 5-page long each. And the GPTSimpleKeywordTableIndex game me the division by zero problem. Now that I wrote it down, could it be that I was using SimpleKeywordTable not a keywordtable?

Hmmm that might be it. I forget what the difference is haha

But also something to think about, you might not need a complicated structure. You'd be surprised how far you can get with a vector index that has a chunk size of 512-1024 and a top k of 3 or so 💪

Looks like the non-simple keyword index should work better though, it's a bit smarter

Yea think it's using not just the regex matching lol.

Appreciate your response! Wondering if you see this message: https://discord.com/channels/1059199217496772688/1059200010622873741/1099009876593741824

Ah I missed that one.

Yea by default llama index chunks your documents, and refines answers across many chunks.

If you want to make the documents bigger during the LLM call, you'll want to setup the prompt helper

Plain Text

# define prompt helper
# set maximum input size
max_input_size = 32000 # ?? Is this right lol
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap,
chunk_size_limit=8000)

# define LLM
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4-32k", max_tokens=num_output))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, chunk_size_limit=8000)

At this point, you are limited by the max size of the emebddings (somewhere around 8000 tokens)

But you can set response_mode="compact" in the query to stuff as much text as possible in each LLM call.

So for example if you fetch 3 nodes, rather than making one call per node, that response mode will stuff as much as possible, up to the max_input_size

This looks like it! Will definitely try it out. Thanks Logan!

So I tried your setup again, I still have trouble constructing an gptsimplevectorindex for documents. The error is: INFO: openai:error_code = none error message ='Too many inputs for model none. the max number of inputs is 1. We hope to increase the number of inputs per request soon.

It only happens with GPTSimpleVectorIndex, works with all other indices structs

Oh I see a issue #947 that prolly could solve this issue.

I think it's a azure specific question

https://github.com/jerryjliu/llama_index/issues/947

Though I'm having trouble setting the embed_batch_size as 1 for the embdding model

Is the BaseEmbedding from llama_index.embeddings.base using ada-002?

I set the embed_batch_size for BaseEmbedding as 1 and was able to construct the index this way

Now i'm having trouble querying the index using the baseembeddings

You might need to share more details on your code haha I'm a little lost

The openAI embeddings use text-ada-002 by default though

service_context_gpt4 = ServiceContext.from_documents(llm_predictor = llm_predictor, prompt_helper = prompt_helper, chunk_size_limit = 8000, embed_model = llama_index.embeddings.base.BaseEmebdding(embed_batch_size = 1))

This way I was able to construct the index, and one document takes 6804 tokens. However, when I query the constructed index, it was giving me the error : TypeError: Unsupported operatnd type for /: 'Nonetype' and Int.

But if I use OpenAIEmbedding, there's not a place where I can't initialize embed_batch_size as 1

which is what the github issues suggest

Since openAIEmbeddings inherit from base embeddings, you can still specify the embed_batch_size in a similar way

The base embedding class isn't intended to be used, only extended

embed_model = OpenAIEmbedding(embed_batch_size=1)

I did this and it gives me ValidationError: 1 validation error for Openaiembeddings

extra fields not permitted

actually

I was able to set the embed_batch_size using the OpenAIEmbedding from llama_index

But now I was getting the error model's maximum context length is 4095 tokens but I requested 6754 tokens. What can I do if I want to read each document into one vector index. I thought the gpt-4-32k should have higher context length than that.

Did you setup the prompt helper properly and whatnot?

sounds like it might not be using gpt-4-32k properly? Maybe try passing the service_context into the query too? I'm flying blind with my advice here lol

Let me copy the code in here

llm = AzureChatOpenAI(deployment_name="gpt-4-32k",model_name = "gpt-4-32k", temperature= 0 ,max_tokens=num_output,model_kwargs={
"api_key": openai.api_key,
"api_base": openai.api_base,
"api_type": openai.api_type,
"api_version": openai.api_version,
"engine": "gpt-4-32k" })
llm_predictor = LLMPredictor(llm= llm)

max_input_size = 32000
num_output = 1000
max_chunk_overlap = 50
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap,chunk_size_limit=8000)

service_context_gpt4 = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, chunk_size_limit=8000,embed_model=OpenAIEmbedding(embed_batch_size=1))

my =GPTSimpleVectorIndex.from_documents([document[0]],service_context=service_context_gpt4)

Cool that looks good so far I think! 🤔 next step I think is trying to pass the service context into the query call too

So when constructing the 'my' index, I was receiving the error: INFO:openai:error_code=None error_message="This model's maximum context length is 4095 tokens, however you requested 6754 tokens (6754 in your prompt; 0 for the completion). Please reduce your prompt; or completion length." error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False

Ohhh

Hmmm

weird thought I was using the gpt-4-32k

That's coming from the embed model I think. The LLM isn't used during index construction

Oh

I thought text Ada had a bigger context length, but it looks like it doesn't 🤔

Do I have to use the GPT List Index

in this case

I definitely thought it has a bigger context length I see bigger files getting pushed into GPTVectorIndex before

Bigger files get pushed in but they get chunked according to the chunk_size_limit

Yea maybe try a list index for now

I set the chunk_size_limit as 8000 tho

Right...But that's the max chunk size. Sounds like the document is probably only 6745 tokens?

Yea.

Doesn't that make it affordable for SimpleVectorIndex to take in one document?

But if text Ada has a limit of 4097, then it's not possible to embed a chunk that big?

Oh I see what you're saying

So I see in the notebook example, they can embed a document using 17000 tokens and construct a gpt simple vector index

So I assume ada has a higher limit

https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo.ipynb

This is what I'm referring to

In that example, thats the token usage after breaking it into the default chunk size (3900) and embedding each chunk

Input documents always get chunked by the node parser into individual nodes

tbh vector-based retrieval doesn't really make sense for large chunk sizes. Articles show that a chunk size of around 1024 will usually give you the best results. Why feed a bunch of unrelated text into a model when you can narrow it down to the exact top_k relevant chunks.

If the entire document is important, than yea, a list index is what you want

That makes total sense. Set the chunk size as 1024 now. And successfully query a vector index. The results look great.

I had the wrong impression that larger chunk size would be better. But since my intent was to extract some section from the document

it makes sense to use vector index

You can still leverage the 32K context though if you want, if you do something like index.query(..., similarity_top_k=5, response_mode="compact") the top 5 chunks will be sent to the LLM in a single call

But you might not need to do that either haha

Yea if the average document size is around 32k, I guess i won't need that haha.

Got a follow-up question: Now I have about 900 documents like this. I would like to be able to extract certain concept and its value from each document, however, I also like to compare documents under certain graph structures. Would it make sense if I do Vector under Keyword under Tree? Also would it be necessary to include a langchain agent in querying?

Not necessary but helpful is the word i wanna use

I think langchain is only needed if you want to maintain some sort of chat history. But with a langchain agent, you also lose some control over what text is actually used to query your index/graph

I think a tree or keyword on top of a vector index would make the most sense, so only two layers to the graph (just my initial impression)

So I know tree index would be really helpful when I want to route to one index for answers. What happened when I want to compare across documents with tree index?

I guess I will try them out haha

Thanks Logan!

Comparing documents is best to use the decompose query transform, that turns a single query into multiple questions 💪

Good luck! 👏

Hi, I was trying to run the QASummaryGraph but when running the query I get this error message: RuntimeError: asyncio.run() cannot be called from a running event loop

Add a reply

Sign up and join the conversation on Discord