Find answers from the community

Updated last year

Hello Mates

At a glance
Hello Mates,
Can you tell me some thing about pdf verctorization stuff?

Is it possible to directly vectorize the pdf? instead of pulling out the text 1st and then doing the vectiorization.

2nd: If I have a folder /data and it contains a pdf that need to be vectorize, then If I add a 2nd file to the same fodler, do i need to repeat the process for both files or the 2nd fill can be updated to current vector store that is locally saved on the disk.

or is it possible to create multiple vector stores and them use them all together?

Looking for some knowledge, thanks
A
L
3 comments
@Logan M will you please give your quick thoughts on this?
It's not possible to create the embeddings without having the text

If you use the filename_as_id option in simple directory reader, you can use the index.refresh_ref_docs(documents) to refresh the index

Basically it uses the doc id as a static identifier, to check if the document is already inserted or needs to be updated in the index

(Note, currently this doesn't work with vectordb integrations, just the default vector db)
Great, thanks for information.
Add a reply
Sign up and join the conversation on Discord