Find answers from the community

Updated 5 months ago

I'm in the very early stages of learning

At a glance
I'm in the very early stages of learning LlamaIndex. First the first time I'm now trying to use ChromaDB instead of just writing my index directly to disk.

But I get an error from ChromaDB about errornous metadata: "ValueError: Expected metadata value to be a str, int, float or bool, got None which is a <class 'NoneType'>"

If I just write the same index to disk, it works. What is it that ChromaDB needs?

The documents I want to store embeddings for is just a couple of plain-text documents with some metadata in YAML format ("key: value").
L
t
17 comments
Seems like something in your metadata is None -- but it should be str, int, float, or bool for chroma to work
Any idea on what metadata ChromaDB is referring to? Even when I strip the YAML from my text files ChromaDB throws the same error. Running the same code on the Paul Graham essay used in many of the examples in the documentation works.
I have no idea which metadata πŸ˜… It depends on what data you loaded I suppose, how you loaded it, etc.

I would just cast my metadata πŸ€·β€β™‚οΈ Kind of silly that chroma doesn't do that for you.
Plain Text
for doc in documents:
  for key, val in doc.metadata:
    if val is None:
      doc.metadata[key] = "None"
Strange thing is that I don't do any manual work with metadata at all. So I've no idea neither. 🀣

And to make it even stranger – at least to me – I just worked around the error: By changing the file suffix from .md to .txt That was the only thing different between the working test essay and my own data. And now it suddenly passed without any errors.
that suffix change will make the file load with a different file-loader
likely the md file loader is parsing your file slightly incorrectly? πŸ€” It does some special handling for headers and whatnot
In the load_data() method for SimpleDirectoryLoader?
SimpleDirectoryReader is a wrapper around several different readers for different file types.

Depending on the file suffix, a different reader is used for each file
Thanks! Turns out that the MarkdownReader class does in fact return a metadata key with the value set to None.
Attachment
CleanShot_2024-01-02_at_17.09.422x.png
lol that's weird
should probably update that
No, not the MarkdownReader class.
But probably the default_file_metadata_func
I wonder why file_type would be none. In any case, it should probably be avoiding inserting None values
Seems strange to me as well. This is how the file_typekey is set in base.py:

`"file_type": mimetypes.guess_type(file_path)[0]``

Getting the file type from the file_path should be pretty straightforward?
I guess it's meant to represent a mimetype rather than an actual file_type ?
I don't fully get how mimetypes.pyworks, it seems like it might fetch a list of filetypes from a couple of different locations. But there is also a hardcode list with file suffixes where .mdis missing. Adding mdto that list solved the problem.
Add a reply
Sign up and join the conversation on Discord