question for this esteemed community

At a glance

question for this esteemed community: when we split the data from csv into chunks , we embed chunks and do a vector search, but can 't see the full original text prior to splitting (unless you link it to a database that stores the full original texts).
what if we embed the metadata too? presumably, the metadata for the chunks coming from the same text that was split should be nearly identical, and the relevance score for those metadata vectors will be almost 1. if you need to find all chunks from the same original text and thus show you the full text, and not the excerpts, you just run another vector search against this one chunk. what do you think?

7 comments

LLogan M

I'm not sure if I completely follow.

LLogan M

The metadata attached to your nodes/documents is already used when calculating embeddings 👀

MMitchMcD

i thought only the chunk gets embedded, no?

LLogan M

nope

https://discord.com/channels/1059199217496772688/1174475184158290044/1174475686266818580

MMitchMcD

hmm. my bad then, i did not know. that's not how i upload and use metadata on supabase now.
what's your take on how to pull the full text where the top chunk is from? via relational database?

LLogan M

I would add some value to the metadata so that you can locate the original full document

MMitchMcD

got it, thanks!

Add a reply

Find answers from the community

question for this esteemed community