Hello all

At a glance

The community member is trying to use an index tree with the Hugging Face LLM function, but is struggling because the number of tokens is higher than 2048. They are working with about 30 documents that are quite large (3000+ tokens each), and the vector store index method is not giving the results they'd wish for.

The comments suggest that the community member should try tweaking the LLM/service context setup to avoid token errors. They provide some example code for setting up the LLM and embedding model, and suggest adding parameters to control the input sizes. Another community member notes that the tree index can be slow, and recommends experimenting with a vector index as well.

The community members discuss the pros and cons of the tree index and vector index approaches. They note that the tree index can be slow, especially for large documents, while the vector index may be more appropriate for certain use cases like working with company bylaws. There is also a discussion about the potential use of graphs with the vector store index, but the community members indicate that this is still an "in progress" concept for LLMs and chatbots.

TThomas1234

Hello all,
I’m trying to use an index tree with the hugging face llm function, but I’m struggling to make it work since the number of tokens is higher than 2048. Did anyone manage to implement it ? I’m trying to implement it since I’m working with about 30 documents that are quite big (think 3000+ tokens each) and the vector store index method is not giving the results I’d wish for. Thanks a lot

11 comments

LLogan M

How did you setup the LLM/service context? We can maybe tweak some things to avoid token errors

TThomas1234

llm = HuggingFaceLLM(
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name=model,
model_name=model,
device_map="auto",
tokenizer_kwargs={"max_length": 1024},
tokenizer_outputs_to_remove=['token_type_ids'],
model_kwargs={"do_sample": False, "trust_remote_code":True, "torch_dtype": torch.float16},
)

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name = "sentence-transformers/all-mpnet-base-v2"))

service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, embed_model = embed_model)

new_index = TreeIndex.from_documents(documents, service_context=service_context)
query_engine = new_index.as_query_engine()

TThomas1234

@Logan M I did it this, by the way are tree index the best way to index fairly large documents that can be used on their own ?

LLogan M

You'll probably want to add two more parameters to help control the input sizes (also I think the tokenizer max length should match the context size, but that's up to you)

Plain Text

llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=256,
    tokenizer_kwargs={"max_length": 2048},
    ...
)

LLogan M

tree index is ok, but it can be slow. You could also experiment with a vector index as well

TThomas1234

@Logan M thanks it worked and indeed it is really slow (took about 2 hours to generate a response.) As for the vector index, Is there a way to easily insert metadata to each chunk ? Like if you’re working on a document and it splits in the middle of a sentence, is there a way to tell the model ?

LLogan M

During the splitting, I wouldn't worry about broken sentences. There's a pretty generous overlap of 200 tokens

(Also that makes sense it is slow, especially if you are running on cpu 😅 )

TThomas1234

I was running it on my gpu, I’m using a Tesla V100 32 GB, so I was fairly surprised since it was only around 70.000 tokens. For instance if I want to have a model that can take documents about company by laws and ask questions about, the vector store index would be the most appropriate ?

LLogan M

70,000 tokens is a lot, especially when the max input size is 2048. Tree index creates a tree of summaries using each chunk. Assuming the chunk size was 512, that means ~68 LLM calls to generate just the bottom layer of the tree. And it will keep summarizing children until it gets to the root node of the tree.

So in summary, this is a lot for any LLM 👀

Yea I would check out the vector store index, which should greatly increase speed

TThomas1234

Okay great, is there any merit to use a graph with the vector store index ?

LLogan M

mmm I'd say it highly depends on the use case. In general, I think graphs are still a sort of "in progress" concept for LLMs and chatbots

Add a reply

Find answers from the community

Hello all