Find answers from the community

Updated 10 months ago

Hi, I'm trying to use chroma with llama-

Hi, I'm trying to use chroma with llama-index. I'm loading some json documents into a documents object. The issue comes when I call the:
index_finance = VectorStoreIndex.from_documents( documents, storage_context=storage_context, service_context=service_context )
Any idea of what I'm missing for the jsons? if I do the same with chroma but with a simpledirectoryreader it works:
documentsNassim = SimpleDirectoryReader("/mnt/nasmixprojects/books/nassimTalebDemo").load_data()
Attachments
image.png
image.png
image.png
W
d
23 comments
I think the indentation is a bit off here
I could solve it with this piece of code
Attachment
image.png
it seems as Chroma doesn't support dictionaries as metadata
That should not be the case, Can you check the type of data in docuemnts when you do it with simpledirectory reader once
Is the code in the first screenshort still the same of did you made some changes?
I have two options to test:

reader = SimpleDirectoryReader(
input_dir="/home/david/weaviate-tests/weaviate-videorack/4-llama-index-contained/sentences", recursive=True, file_extractor={".json": MyJSONReader()}
)

documentsNassim = SimpleDirectoryReader("/mnt/nasmixprojects/books/nassimTalebDemo").load_data()
the first one is the one that needs the string conversion to work with chroma
Can you give me the output just before you return the object from your JSONReader
The whole document , trim the text part so that readability is there
you mean this?
Attachment
image.png
or this?
Attachment
image.png
Yeah just do print(documents[0])
trime the text to just "TEXT" for better readability
Can you give me MYJSONReader code so that I can make the changes then you can try
Plain Text
class MyJSONReader(BaseReader):
    def load_data(self, file, extra_info=None):
        with open(file, "r") as f:
            #text = f.read()
            json_data = json.load(f)
            for uuid, row in json_data.items():# in our case, only one json element per file
                #print (row)
                text = row["english_text"]

                extra_info["uuid"] = uuid
                extra_info["video_name"] = row["video_name"]
                extra_info["video_path"] = row["video_path"]
                extra_info["original_text"] = row["original_text"]
                extra_info["length_characters"] = row["length_characters"]
                extra_info["original_lang"] = row["original_lang"]
                extra_info["video_section"] = row["video_section"]
                 
                print("extra_info2->" +str(extra_info))
                
        return [Document(text=text,  extra_info=extra_info , excluded_embed_metadata_keys=['uuid','video_name','file_path','original_text','length_characters','original_lang','video_section'] ,  excluded_llm_metadata_keys=['uuid','video_name','file_path','original_text','length_characters','original_lang','video_section']), ]

      
reader = SimpleDirectoryReader(
    input_dir="/home/david/weaviate-tests/weaviate-videorack/4-llama-index-contained/sentences", recursive=True, file_extractor={".json": MyJSONReader()}
)

documents = reader.load_data()
`
Try adding this:
Plain Text
        metadata = lambda extra_info:{"key":"value"}
        return [Document(text=text,  metadata=metadata , excluded_embed_metadata_keys=['uuid','video_name','file_path','original_text','length_characters','original_lang','video_section'] ,  excluded_llm_metadata_keys=['uuid','video_name','file_path','original_text','length_characters','original_lang','video_section']), ]


I use to do this for simpledirectoryreader, for the same reason that metadata needs to be in str, see if this works

keyv value will be the dict that you prepare
I get this:
Attachment
image.png
Locally it is working fine right , if you do not pass it to Chroma?
You may be right that chroma may be not able to read it as a dict.
You could continute with your own way and maybe if i will find anything on it will post it here
yes, I have another code for the jsons without Chroma and it had no issues
sure, I'll just keep going with the fix for now
Thanks @WhiteFang_Jr !
Add a reply
Sign up and join the conversation on Discord