Find answers from the community

Updated 4 months ago

afternoon all!

At a glance
afternoon all!

why does Document metadata have to be json serializable? thought i'd be able to handle anything during extraction steps
L
e
20 comments
If it's not json serializable, we can't reliably store it anywhere
I mean, yea the nodes could be in-mem. Technically you can assign arbitrary objects to the metadata

node = TextNode(text="test", metadata={"other": TextNode(text="test2")})

But as soon as you need that node to interact with a storage layer, it doesn't really work
For example with the above node, this seemed to work
index = VectorStoreIndex(nodes=[node])
but trying to persist that anywhere will probably break
actually I think that worked because TextNode is serializable lol
I should have picked a different example
ha ya i bumped into it using tree-sitter, i wanted to attach ast of the file to the Documents when building them and got Tree unserializable. can print out the ast but was curious about the error
so is a db being used for document storage by default?
more like an in-memory JSON dict
For example, the base vector store supports metadata filtering. That calls the node_to_metadata_dict which basically requires the node and it's metadata to be serializable
You could definitely write your own storage classes to avoid this
i will look into that
Probably would just have to subclass the existing storage, override a few methods, not too bad if you are reasonably good at python πŸ‘
appreciate the pointer
Add a reply
Sign up and join the conversation on Discord