It sounds slow, but it's surprisingly fast -- even with 10,000 vectors, the search should finish in a second or so
Thanks logan!
query_task = index_to_query.as_retriever(num_results=2).aretrieve(QueryBundle(query_str=document.text))
any way to get the query embedding [floats] from this kind of aretrieve?
context in photo - want to create an openinference record from aretrieves (i.e. without using .query)
hmm, the only way to get the embddings will be using a callback handler.
Actually, someone just recently contributed an openinference handler
One sec, I can link the notebook
thing is, open_inf_callback only works with .query (not .retreive)
so i'm forming my own open inference record
ah right, because the callback isn't inside the retrieve method
yeah. so i'm just trying to grab that query_embedding from the execution path of .retreive
quick hack: use a query engine, but set response_mode="no_text"
i.e. index.as_query_engine(response_mode="no_text", similarity_top_k=2)
This will skip calling the LLM, and only do the retrieve step. It will also hit the callback handler
Interesting! I'll try this - i'm not sure it'll work with openinferencecallbackhandler out of the box w that thoug eh?
It should I think? But let me know how it goes haha
This added to the openinf callback handler buffers
which is neat because a) i can use the buffers to create my open inf record b) it didn't do an llm call
@Logan M one more thing
here's my index docstore docs:
doesn't seem there's a way to query/retrieve against a subset of a VectorStoreIndex / SimpleDocumentStore based on metadata keys
do you have a hack for this?
I've done my own filtering funcs e.g. pictured - but it's not efficient
e.g. this doesn't seem optimal
Yea we haven't implemented metadata filtering for the base vector store sadly ๐
So custom approach is probably best for now, until we figure it out
@Logan M Thank you. Also gotta ask: best way to send in several query texts at once?
I'm doing asyncio.gather(query_engine.aquery(text))
here on 4 docs
but as can see in terminal, it does 4 seperate openai embeddings calls
Hmm I think the asyncio approach is still best ๐
Thankfully, at least in newer versions of llama-index, the embeddings calls are also async, so it should be fairly efficient ๐