Chroma

At a glance

What are some best practices for persisting indexes? I am using ChromaDB and adding documents/embeddings in a separate process (outside of Llama Index). I am interested in building composable indexes that are groups of keywords that relate to particular documents. As of right now, I have lots of separate documents (200k+) and i don't get super accurate results. My plan has been to separate these documents into different categories that have metadata associated with them so that there can be a more accurate retrieval process. With that structure, how do we store these indexes?

11 comments

LLogan M

Wouldn't each category be it's own chroma collection/index then?

ccmagorian

I guess that would be one good way to structure it. In that case if I had 50 categories, id need to construct that index for every chat session.

ccmagorian

Is there a way to store that composed index?

LLogan M

just storing one chroma instance at a time is really the only way

Other vector dbs have this idea of collection or index names, or namespaces, which makes storage a bit more straightforward

LLogan M

I think you'd only have to construct it at server startup though -- chat sessions are just dependant on the chat history

ccmagorian

i see - thanks for advice

ccmagorian

generally, what is the serialization of the index via .persist() and load_index_from_storage()?

ccmagorian

Specific to chroma, does the metadata properties help improve the search for llama_index or are just embeddings used?

LLogan M

It's just a json. But with chroma, I think the persistence is different/automatic right?

Definitely it does, both the embeddings and the LLM will leverage the metadata. Or you can configure metadata that is only used for one or the other

https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/documents_and_nodes/usage_documents.html

https://gpt-index.readthedocs.io/en/stable/examples/metadata_extraction/MetadataExtraction_LLMSurvey.html

ccmagorian

gotcha - considering adding metadata to each of the documents when ingesting to chromadb to see if helps with finding more relevant nodes

ccmagorian

that worked great! thanks @Logan M

Add a reply

Find answers from the community

Chroma