Find answers from the community

Updated 2 months ago

index range error

Plain Text
from llama_index.llms import OpenAI, HuggingFaceLLM
from langchain.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceEmbeddings
hf = HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': True}
)
llm = HuggingFaceLLM(model_name="Deci/DeciLM-6b")

service_context = ServiceContext.from_defaults(
    callback_manager=callback_manager, embed_model=hf, llm=llm
)
set_global_service_context(service_context)
# Load in the Documents
documents = SimpleDirectoryReader(input_files=input_files).load_data()
parser = SimpleNodeParser.from_defaults()

nodes = parser.get_nodes_from_documents(documents, show_progress=True)
response_synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.COMPACT, use_async=True, verbose=True,
)
doc_summary_index = DocumentSummaryIndex(
    nodes, show_progress=True, response_synthesizer=response_synthesizer)
W
b
L
42 comments
leading to the following error:

Plain Text
<ipython-input-10-a67659516e0e> in create_and_persist_index(self)
     49             response_mode=ResponseMode.COMPACT, use_async=True, verbose=True,
     50         )
---> 51         doc_summary_index = DocumentSummaryIndex(
     52             nodes, show_progress=True, response_synthesizer=response_synthesizer)
     53         doc_summary_index.storage_context.persist(

/usr/local/lib/python3.10/dist-packages/llama_index/indices/document_summary/base.py in __init__(self, nodes, index_struct, service_context, response_synthesizer, summary_query, show_progress, **kwargs)
     75         )
     76         self._summary_query = summary_query or "summarize:"
---> 77         super().__init__(
     78             nodes=nodes,
     79             index_struct=index_struct,
...
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2208         # remove once script supports set_grad_enabled
   2209         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2211 
   2212 

IndexError: index out of range in self
any ideas as to what is happening?
apologies for the long paste - wanted to be concise. Appreciate the help πŸ™‚
What happens if you do DocumentSummaryIndex.from_documents(documents)
Same thing

Plain Text
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
**********
Trace: index_construction
    |_node_parsing ->  4.447528 seconds
      |_chunking ->  4.356259 seconds
    |_synthesize ->  7.47194 seconds
      |_templating ->  5.5e-05 seconds
      |_llm ->  0.0 seconds
      |_exception ->  0.0 seconds
      |_exception ->  0.0 seconds
        |_exception ->  0.0 seconds
**********
...
26 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2208         # remove once script supports set_grad_enabled
   2209         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2211 
   2212 

IndexError: index out of range in self
Any chance it could be due to async processing? use_async=True, and
Plain Text
import nest_asyncio
nest_asyncio.apply()


edit: that wasn't it
seems somthing to do with HuggingFaceEmbeddings but not sure
does it matter that this is being imported and used from langchain? Is there a llamaindex equivalent?
I don't fully understand the relationship between langchain and llamaindex with these things - if they just work or there's some sort of massaging that needs to be done
gonna tap in @Logan M
I don't fully understand this but I think it's the issue
Yea there's an issue with how you setup the LLM I think -- you need to give a tokenizer name as well

llm = HuggingFaceLLM(model_name="...", tokenizer_name="...")
Alternatively you can load the llm and tokenizer outside of llama index like in the hughingface model card, and pass them in directly

HuggingFaceLLM(model=model, tokenizer=tokenizer)

https://huggingface.co/Deci/DeciLM-6b#how-to-get-started-with-the-model
ah it was using the stablelm tokenizer
got it - you guys are the mvps. I'll confirm that the error goes away but thank you so much πŸ™‚
Confirmed that fixed it. s/o to you both.

While I have you here though. I love the concept of DocumentSummaryIndex but it's so computationally expensive with everything. I have a pdf of about 1500 pages (actually, a very long .txt file but roughly equivalent to that) and I'm running on a GPU and it's still taking a very long time. I actually want to create indices for many more documents of similar size but this is just impractically slow and expensive. Are there better approaches?

I explained my goal if you need context: https://discord.com/channels/1059199217496772688/1059200010622873741/1154405956004888616
Yeaaaa it does not scale to data that large sadly.

A better approach is probably a) splitting the PDF into sections if you can? and b) using a sub question query engine or retriever router if possible

The difference there is mainly you'd mainly have to supply some description of what each section is useful for (whereas the document summary index makes the LLM write a summary for a similar purpose)
Also, one thing to note for future usage. TreeSummarize with DocumentSummaryIndex using davinci as the LLM with the default OpenAI embedder was very very expensive for one document (the same document mentioned above).

I believe I had spent around $8 with testing with a $20 hard-cap on OpenAI. Just that one run led me to be charged a total of $31, going above my set cap (which shocked me).

I don't know what happened on the OpenAI side of things to not enforce the hard limit there but just as a heads up there. Probably a bug somewhere or an eventual consisteny thing.
Is there a huge difference between davinci as an embedder vs text-ada-002? ada is sooooo cheap
itd have to be a big difference to be worth it imo
You could also use local embeddings too, tbh they are quite good these days

Plain Text
# could also use "local:BAAI/bge-base-en-v1.5"
service_context = ServiceContext.from_defaults(..., embed_model="local:BAAI/bge-small-en-v1.5")
too slow on my machine for the data i'm doing and document summary is... slow
Yea, need CUDA for large data
embedding model and LLM are separate models tho btw, could still use openai LLM for the summary
yeah I've been trying to get a gcp vm for days now and it's always like "none available - sorry bro"
maybe that was the miss on my part but I used davinci as the llm - I tried to use gpt3-5 but i kept hitting rate-limit errors
I'm very thankful gpt3-5 didn't work - who knows how expensive that would've gotten
anyway - just a heads up that hard limit on openai is apparently not fool-proof
gpt-3.5 is 10x cheaper than davinci πŸ˜…
davinici is also being depcreated -- you may have to move to gpt-3.5-turbo-instruct soon
I'm the fool here.

i've been using hf models locally now on colab until I can figure out how to get a vm with a gpu to do the large embedding work. Right now I'm still trying to figure out a good setup for what I'm trying to accomplish
Speaking of which: today I noticed that a few different llms that, after querying, they would repeat the same answer many times in an output. Have you seen that before / are you aware of certain things that make either the llm or the embedder do that?
My setup is:

Plain Text
hf = HuggingFaceBgeEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",
    model_kwargs={'device': 'cuda'},
    encode_kwargs={'normalize_embeddings': True}
)
llm = HuggingFaceLLM()


Using VectorStoreIndex and CitationQueryEngine
I tried different hf LLMs and they all seemed to do this so I'm guessing it's the embedder
The embeddings have nothing to do with the LLM actually, they are actually completely different steps.

The embeddings help retrieve relevant nodes. Once we have the nodes, those get sent to the LLM

Depending on the LLM you are using, you make have to play around with the generation kwargs (temperature, top p, top k, repetition penalty)

Different LLMs will also need different query wrapper prompts most likely too
Plain Text
**********
Trace: query
    |_query ->  9.961778 seconds
      |_retrieve ->  0.058722 seconds
        |_embedding ->  0.017182 seconds
      |_synthesize ->  9.902916 seconds
        |_templating ->  2.8e-05 seconds
        |_llm ->  9.89647 seconds
**********


What is the embedding step here?
embedding the query text
I get temperature, the rest (top p, top k, repetition penalty) I need to look into

and I need to take that query wrapper more seriously - I've been lazy about it
usually the model card on huggingface should give an example πŸ™
Add a reply
Sign up and join the conversation on Discord