Find answers from the community

Updated 3 months ago

afternoon all!

afternoon all!

why does Document metadata have to be json serializable? thought i'd be able to handle anything during extraction steps
L
e
20 comments
If it's not json serializable, we can't reliably store it anywhere
I mean, yea the nodes could be in-mem. Technically you can assign arbitrary objects to the metadata

node = TextNode(text="test", metadata={"other": TextNode(text="test2")})

But as soon as you need that node to interact with a storage layer, it doesn't really work
For example with the above node, this seemed to work
index = VectorStoreIndex(nodes=[node])
but trying to persist that anywhere will probably break
actually I think that worked because TextNode is serializable lol
I should have picked a different example
ha ya i bumped into it using tree-sitter, i wanted to attach ast of the file to the Documents when building them and got Tree unserializable. can print out the ast but was curious about the error
so is a db being used for document storage by default?
more like an in-memory JSON dict
For example, the base vector store supports metadata filtering. That calls the node_to_metadata_dict which basically requires the node and it's metadata to be serializable
You could definitely write your own storage classes to avoid this
Probably would just have to subclass the existing storage, override a few methods, not too bad if you are reasonably good at python πŸ‘
Add a reply
Sign up and join the conversation on Discord