Do you have the full logs from the query?
I think this happens when the query doesn't have any keywords that matc the keyword index π€
I think so, the extracted keywords list is a empty list. However i asked the question strictly pertaining to what I have in the summary for that subindex. How should I prevent this from happening? Use a different parent index struct?
I was using List over Tree, the query is fine but I don't think it's optimized for comparison and contrast.
You have many sub-tree indexes right?
Keyword is usually fine, I'm surprised the query keyword list was empty (maybe your query was too short or generic?)
Yea a list index on top will check every sub index, so not always the best option. Maybe a top level vector index would be better?
It's surprising to me as well because I tested before using keyword as subindex and it usually works out pretty well if my query has words that i know show up in the document. I tried using the structure Vector -> Keyword -> Tree as shown in the guide to creating a unified query framework over your indexes page.
However, i'm having difficulty constructing subindices for documents that are 5-page long each. And the GPTSimpleKeywordTableIndex game me the division by zero problem. Now that I wrote it down, could it be that I was using SimpleKeywordTable not a keywordtable?
Hmmm that might be it. I forget what the difference is haha
But also something to think about, you might not need a complicated structure. You'd be surprised how far you can get with a vector index that has a chunk size of 512-1024 and a top k of 3 or so πͺ
Looks like the non-simple keyword index should work better though, it's a bit smarter
Yea think it's using not just the regex matching lol.
Ah I missed that one.
Yea by default llama index chunks your documents, and refines answers across many chunks.
If you want to make the documents bigger during the LLM call, you'll want to setup the prompt helper
# define prompt helper
# set maximum input size
max_input_size = 32000 # ?? Is this right lol
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap,
chunk_size_limit=8000)
# define LLM
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4-32k", max_tokens=num_output))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, chunk_size_limit=8000)
At this point, you are limited by the max size of the emebddings (somewhere around 8000 tokens)
But you can set response_mode="compact" in the query to stuff as much text as possible in each LLM call.
So for example if you fetch 3 nodes, rather than making one call per node, that response mode will stuff as much as possible, up to the max_input_size
This looks like it! Will definitely try it out. Thanks Logan!
So I tried your setup again, I still have trouble constructing an gptsimplevectorindex for documents. The error is: INFO: openai:error_code = none error message ='Too many inputs for model none. the max number of inputs is 1. We hope to increase the number of inputs per request soon.
It only happens with GPTSimpleVectorIndex, works with all other indices structs
Oh I see a issue #947 that prolly could solve this issue.
I think it's a azure specific question
Though I'm having trouble setting the embed_batch_size as 1 for the embdding model
Is the BaseEmbedding from llama_index.embeddings.base using ada-002?
I set the embed_batch_size for BaseEmbedding as 1 and was able to construct the index this way
Now i'm having trouble querying the index using the baseembeddings
You might need to share more details on your code haha I'm a little lost
The openAI embeddings use text-ada-002 by default though
service_context_gpt4 = ServiceContext.from_documents(llm_predictor = llm_predictor, prompt_helper = prompt_helper, chunk_size_limit = 8000, embed_model = llama_index.embeddings.base.BaseEmebdding(embed_batch_size = 1))
This way I was able to construct the index, and one document takes 6804 tokens. However, when I query the constructed index, it was giving me the error : TypeError: Unsupported operatnd type for /: 'Nonetype' and Int.
But if I use OpenAIEmbedding, there's not a place where I can't initialize embed_batch_size as 1
which is what the github issues suggest
Since openAIEmbeddings inherit from base embeddings, you can still specify the embed_batch_size in a similar way
The base embedding class isn't intended to be used, only extended
embed_model = OpenAIEmbedding(embed_batch_size=1)
I did this and it gives me ValidationError: 1 validation error for Openaiembeddings
extra fields not permitted
I was able to set the embed_batch_size using the OpenAIEmbedding from llama_index
But now I was getting the error model's maximum context length is 4095 tokens but I requested 6754 tokens. What can I do if I want to read each document into one vector index. I thought the gpt-4-32k should have higher context length than that.
Did you setup the prompt helper properly and whatnot?
sounds like it might not be using gpt-4-32k properly? Maybe try passing the service_context into the query too? I'm flying blind with my advice here lol
Let me copy the code in here
llm = AzureChatOpenAI(deployment_name="gpt-4-32k",model_name = "gpt-4-32k", temperature= 0 ,max_tokens=num_output,model_kwargs={
"api_key": openai.api_key,
"api_base": openai.api_base,
"api_type": openai.api_type,
"api_version": openai.api_version,
"engine": "gpt-4-32k" })
llm_predictor = LLMPredictor(llm= llm)
max_input_size = 32000
num_output = 1000
max_chunk_overlap = 50
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap,chunk_size_limit=8000)
service_context_gpt4 = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, chunk_size_limit=8000,embed_model=OpenAIEmbedding(embed_batch_size=1))
my =GPTSimpleVectorIndex.from_documents([document[0]],service_context=service_context_gpt4)
Cool that looks good so far I think! π€ next step I think is trying to pass the service context into the query call too
So when constructing the 'my' index, I was receiving the error: INFO:openai:error_code=None error_message="This model's maximum context length is 4095 tokens, however you requested 6754 tokens (6754 in your prompt; 0 for the completion). Please reduce your prompt; or completion length." error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False
weird thought I was using the gpt-4-32k
That's coming from the embed model I think. The LLM isn't used during index construction
I thought text Ada had a bigger context length, but it looks like it doesn't π€
Do I have to use the GPT List Index
I definitely thought it has a bigger context length I see bigger files getting pushed into GPTVectorIndex before
Bigger files get pushed in but they get chunked according to the chunk_size_limit
Yea maybe try a list index for now
I set the chunk_size_limit as 8000 tho
Right...But that's the max chunk size. Sounds like the document is probably only 6745 tokens?
Doesn't that make it affordable for SimpleVectorIndex to take in one document?
But if text Ada has a limit of 4097, then it's not possible to embed a chunk that big?
Oh I see what you're saying
So I see in the notebook example, they can embed a document using 17000 tokens and construct a gpt simple vector index
So I assume ada has a higher limit
This is what I'm referring to
In that example, thats the token usage after breaking it into the default chunk size (3900) and embedding each chunk
Input documents always get chunked by the node parser into individual nodes
tbh vector-based retrieval doesn't really make sense for large chunk sizes. Articles show that a chunk size of around 1024 will usually give you the best results. Why feed a bunch of unrelated text into a model when you can narrow it down to the exact top_k relevant chunks.
If the entire document is important, than yea, a list index is what you want
That makes total sense. Set the chunk size as 1024 now. And successfully query a vector index. The results look great.
I had the wrong impression that larger chunk size would be better. But since my intent was to extract some section from the document
it makes sense to use vector index
You can still leverage the 32K context though if you want, if you do something like index.query(..., similarity_top_k=5, response_mode="compact")
the top 5 chunks will be sent to the LLM in a single call
But you might not need to do that either haha
Yea if the average document size is around 32k, I guess i won't need that haha.
Got a follow-up question: Now I have about 900 documents like this. I would like to be able to extract certain concept and its value from each document, however, I also like to compare documents under certain graph structures. Would it make sense if I do Vector under Keyword under Tree? Also would it be necessary to include a langchain agent in querying?
Not necessary but helpful is the word i wanna use
I think langchain is only needed if you want to maintain some sort of chat history. But with a langchain agent, you also lose some control over what text is actually used to query your index/graph
I think a tree or keyword on top of a vector index would make the most sense, so only two layers to the graph (just my initial impression)
So I know tree index would be really helpful when I want to route to one index for answers. What happened when I want to compare across documents with tree index?
I guess I will try them out haha
Comparing documents is best to use the decompose query transform, that turns a single query into multiple questions πͺ
Hi, I was trying to run the QASummaryGraph but when running the query I get this error message: RuntimeError: asyncio.run() cannot be called from a running event loop