Find answers from the community

Updated 2 years ago

Vectors

At a glance
The community member who posted the original post is interested in understanding the data structure and implementation details of a system that stores document embeddings. They ask whether each node has both the embedding vector and a string excerpt, or if these are stored separately. They also inquire about how the vector nodes are ordered in a binary search tree, and if there is a specific ordering scheme based on cosine relatedness. In the comments, another community member explains that the vector and document data are currently stored separately, linked by unique IDs. They also mention that the cosine similarity calculation is straightforward, just comparing one vector to a list of vectors, and that the math is fast. The community member also notes that other vector stores like Pinecone and Qdrant may have different approaches for finding matches faster.
Thanks for your reply! Does each node have a data field for both the embedding vector and a string excerpt from the unstructured source, or are these two separate data structures that are linked? Also, if the document embedding data structure is some variety of binary search tree, how are each of the vector nodes ordered on that tree? Is there some type of value ordering scheme for cosine relatedness between embedding vectors? I'm interested in learning more on this subject so I'm open to looking into the source code or the docs to get a better understanding of this.
L
1 comment
Currently they are stored separately (vector store and doc store) and linked by unique ids

To do the cosine similarity is nothing too fancy, just comparing one vector to a list of vectors. But the math for this is pretty fast.

Other vector stores like pinecone, qdrant, etc might have other approaches for finding matches faster
Add a reply
Sign up and join the conversation on Discord