Find answers from the community

Updated 3 months ago

Similarity

I have these embeddings that i extracted from my index and i want to query to get the cosine similarity from another llama index (without using a query str/text str) - how do u recc taking raw embeddings and getting the most similar from an index?

eg: QueryBundle type only takes mandatory query_str

feels like im missing a simple way to do this
Attachment
Screenshot_2023-08-02_at_1.31.11_AM.png
L
B
23 comments
You'll need to do a pairwise comparison.

Take each candidate vector, compare to every vector in an index, and select the top k most similar

You could use this function to help with the top k part

https://github.com/jerryjliu/llama_index/blob/ac5141e548f4ff8ff1347d495e3e020a5cb3e3bd/llama_index/indices/query/embedding_utils.py#L11
It sounds slow, but it's surprisingly fast -- even with 10,000 vectors, the search should finish in a second or so
Thanks logan!

query_task = index_to_query.as_retriever(num_results=2).aretrieve(QueryBundle(query_str=document.text))

any way to get the query embedding [floats] from this kind of aretrieve?

context in photo - want to create an openinference record from aretrieves (i.e. without using .query)
Attachment
image.png
hmm, the only way to get the embddings will be using a callback handler.

Actually, someone just recently contributed an openinference handler
One sec, I can link the notebook
they set it here
Attachment
image.png
thing is, open_inf_callback only works with .query (not .retreive)
so i'm forming my own open inference record
ah right, because the callback isn't inside the retrieve method
yeah. so i'm just trying to grab that query_embedding from the execution path of .retreive
quick hack: use a query engine, but set response_mode="no_text"

i.e. index.as_query_engine(response_mode="no_text", similarity_top_k=2)

This will skip calling the LLM, and only do the retrieve step. It will also hit the callback handler
Interesting! I'll try this - i'm not sure it'll work with openinferencecallbackhandler out of the box w that thoug eh?
It should I think? But let me know how it goes haha
Attachment
image.png
This added to the openinf callback handler buffers
which is neat because a) i can use the buffers to create my open inf record b) it didn't do an llm call
Attachment
image.png
@Logan M one more thing

here's my index docstore docs:

doesn't seem there's a way to query/retrieve against a subset of a VectorStoreIndex / SimpleDocumentStore based on metadata keys

do you have a hack for this?

I've done my own filtering funcs e.g. pictured - but it's not efficient
Attachments
image.png
image.png
e.g. this doesn't seem optimal
Attachment
image.png
Yea we haven't implemented metadata filtering for the base vector store sadly ๐Ÿ˜…
So custom approach is probably best for now, until we figure it out
@Logan M Thank you. Also gotta ask: best way to send in several query texts at once?

I'm doing asyncio.gather(query_engine.aquery(text)) here on 4 docs

but as can see in terminal, it does 4 seperate openai embeddings calls
Attachment
image.png
Hmm I think the asyncio approach is still best ๐Ÿ‘€

Thankfully, at least in newer versions of llama-index, the embeddings calls are also async, so it should be fairly efficient ๐Ÿ™
Add a reply
Sign up and join the conversation on Discord