Hi guys, has anyone tried to use

At a glance

Hi guys, has anyone tried to use TreeIndex class. I used TreeIndex.from_documents to index (used transformation too). It parses and summarizes but does not embed. Tried many times with chatgpt but couldn't get it to work. I used SimpleDirectoryReader to read few documents. I did Settings and ServiceContext. For transformation, I set transformations = transformations_from_settings_or_context(Settings, service_context). Thanks!

5 comments

LLogan M

I'm pretty sure things get embedded lazily during retrieval (if you are using an embedding-based retrieval method)

xxfz823

I persisted the vector store to disk. The default_vectorstore is empty. docstore has content. Also the progress bar showed parsing done, summarizing done, but no embedding. Plus it's way too fast - embedding takes a while. I used to just use documentsummaryindex but want to use the standard TreeIndex this time. I found the code on GitHub and I thought I put in the right arguments:
tree_index = TreeIndex.from_documents(
documents,
service_context=service_context,
show_progress=True,
transformations=transformations,
)

LLogan M

the embedding doesn't happen until you do retrieval

LLogan M

from_documents() does not do embeddings, unless you specifically included embeddings in your transformations

xxfz823

Thanks Logan. Sorry about the late response. Yes, I see what you are saying. The indexing part of treeindex mainly does the summarization. I believe that this one doesn't necessarily need embedding for the retrieval neither. I looked at the code on github further. The treeindex mainly provides a structure template . It has a lot options, so that you can do a lot with it but it requires/can handle customization. But if you want a simple "just do it" tool, then it's not probably not the best choice. Since I was build a search bot that asks follow-up questions, which required "memory", I ended up using just as_chat_engine (condense + context, openai ) class instead, which worked out great, even without summarization. The embedded "memory" saves a lot extra coding and works out really well. Anyway, thanks again for your prompt response.

Add a reply

Find answers from the community

Hi guys, has anyone tried to use