Find answers from the community

Updated last month

Retrieving the Latest Metadata from Documents

Hello! When having a metadata tag like {"year": 2014} is there a way to only get the latest (i.e. max year) from the documents? Could this be achieved with reranking? I imagine this to be a common problem that many similar nodes are in the vector store but ideally only the most recent ones should be used. Or maybe this is best done in regular python after getting the top n nodes? Any ideas or hints appreciated πŸ™‚
L
k
4 comments
This is exactly what metadata filtering is for πŸ‘€

.as_query_engine(..., filters=MetadataFilters(filters=[MetadataFilter(key=key, val=val)])

An example with qdrant
https://docs.llamaindex.ai/en/stable/examples/vector_stores/Qdrant_metadata_filter/
Yes I looked at that, but I don't know the year up front, with MetaData filtering I can only do LT(E) or GT(E). I also checked the FilterOperator enum and there's no max (which I guess makes sense)
True true. I guess ideally you don't ingest the same document for different years and properly do an upsert/delete+insert πŸ˜…
Other option is doing a retrieve or get_nodes call to try and sample the available years, and then set the filter from there
Add a reply
Sign up and join the conversation on Discord