Find answers from the community

Updated 11 months ago

Hi there, i was curious if it is

Hi there, i was curious if it is possible to quickly check if a specific document is indexed or not. We currently do this in the following way but it is very slow (4sec).

Plain Text
filename_without_ext = "bla":
index = initialize_index(model)
filters = MetadataFilters(filters=[ExactMatchFilter(key="doc_id", value=filename_without_ext)])
document_is_not_indexed = len(
    index.as_retriever(filters=filters, similarity_top_k=1).retrieve("some text"),
) == 0
m
L
N
6 comments
if you have the docstore available (we save it to blob storage for example), you can do index.docstore.document_exists()
^ that will probably be the fastest (it's a key-value lookup)

Some vector dbs have a get() api, but you'd have to use the vector db client directly
Ah cool, so we use Postgres with the pgvector extension installed. It would make sense to directly query that DB to check?
I think so, sounds easy enough to do 🫑
Should be much faster without the vector search component that the above was doing
Alright cool, gonna try that. Thanks guys!
Add a reply
Sign up and join the conversation on Discord