Find answers from the community

Updated 3 months ago

Hello all

Hello all,
I’m trying to use an index tree with the hugging face llm function, but I’m struggling to make it work since the number of tokens is higher than 2048. Did anyone manage to implement it ? I’m trying to implement it since I’m working with about 30 documents that are quite big (think 3000+ tokens each) and the vector store index method is not giving the results I’d wish for. Thanks a lot
L
T
11 comments
How did you setup the LLM/service context? We can maybe tweak some things to avoid token errors
llm = HuggingFaceLLM(
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name=model,
model_name=model,
device_map="auto",
tokenizer_kwargs={"max_length": 1024},
tokenizer_outputs_to_remove=['token_type_ids'],
model_kwargs={"do_sample": False, "trust_remote_code":True, "torch_dtype": torch.float16},
)


embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name = "sentence-transformers/all-mpnet-base-v2"))

service_context = ServiceContext.from_defaults(chunk_size=512, llm=llm, embed_model = embed_model)

new_index = TreeIndex.from_documents(documents, service_context=service_context)
query_engine = new_index.as_query_engine()
@Logan M I did it this, by the way are tree index the best way to index fairly large documents that can be used on their own ?
You'll probably want to add two more parameters to help control the input sizes (also I think the tokenizer max length should match the context size, but that's up to you)

Plain Text
llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=256,
    tokenizer_kwargs={"max_length": 2048},
    ...
)
tree index is ok, but it can be slow. You could also experiment with a vector index as well
@Logan M thanks it worked and indeed it is really slow (took about 2 hours to generate a response.) As for the vector index, Is there a way to easily insert metadata to each chunk ? Like if you’re working on a document and it splits in the middle of a sentence, is there a way to tell the model ?
During the splitting, I wouldn't worry about broken sentences. There's a pretty generous overlap of 200 tokens

(Also that makes sense it is slow, especially if you are running on cpu 😅 )
I was running it on my gpu, I’m using a Tesla V100 32 GB, so I was fairly surprised since it was only around 70.000 tokens. For instance if I want to have a model that can take documents about company by laws and ask questions about, the vector store index would be the most appropriate ?
70,000 tokens is a lot, especially when the max input size is 2048. Tree index creates a tree of summaries using each chunk. Assuming the chunk size was 512, that means ~68 LLM calls to generate just the bottom layer of the tree. And it will keep summarizing children until it gets to the root node of the tree.

So in summary, this is a lot for any LLM 👀

Yea I would check out the vector store index, which should greatly increase speed
Okay great, is there any merit to use a graph with the vector store index ?
mmm I'd say it highly depends on the use case. In general, I think graphs are still a sort of "in progress" concept for LLMs and chatbots
Add a reply
Sign up and join the conversation on Discord